Friday, 23 August 2013

Troubleshooting SharePoint Search Crawl

Troubleshooting SharePoint Search Crawl

Crawl Troubleshooting

Every once in a while SharePoint Crawler will behave unexpectedly on web sites that you wish to crawl. You may get error messages that you can understand and help you troubleshoot the problem or you will get just one error that is not helpful at all. Either way, I have found that you can troubleshoot the crawl by using the following technique with Fiddler Web Debugging Proxy (http://www.fiddler2.com). Using Fiddler we are going to configure SharePoint Search to crawl through Fiddler as a Proxy so we can watch the traffic.
  1. Download and install Fiddler on the server running the crawl.
  2. Determine which account is running the crawl. Usually it will be the Default content access account listed in Search Administration:
    Though, if you have crawl rules set up for specific content sources you may have alternate credentials specified, so check your rules and be sure you are using the correct account for testing.
  3. Hold down the [Ctrl][Shift] keys and right click Fiddler to choose “Run as different user”. Log in as the Crawl Account. (If this option is not available, you may have to log out and log back in as the crawl account. Either way you need to run Fiddler as the crawl account.)
  4. Once Fiddler is running choose Tools | Fiddler Options… and click the Connections tab. Note theFiddler listens on port: setting. 8888 is the default. Ensure that it does not duplicate a port already in use by SharePoint. Close the dialog after making any necessary adjustments to the port.
  5. Open a browser and go to http://localhost:8888 (or whatever your port number is for Fiddler) and you should see something like the following indicating that you are set up correctly.
  6. To configure SharePoint to use Fiddler return to Search Administration and choose the link forProxy Server from the System Status section.
  7. Configure SharePoint to use Fiddler by choosing Use the proxy server specified and adding the address and port.
  8. Click OK to save your settings.
  9. Start the crawl for the content source that you are having issues with by choosing Content Sources. Select the content source and choose Start Full Crawl.
  10. Once the crawl starts you should begin to see activity in Fiddler. In the example below I am crawling a small HTML web site. The crawler always looks for a robots.txt file first. In my case I don’t have one, so Fiddler displays the 404 result. Following that I see one result for each request.
  11. Crawling a SharePoint site yields similar results, though you will notice that the crawler uses the SiteData service to gather information about the site from SharePoint.
  12. Once you are done testing be sure to reset the Proxy settings in the Search Application.

Armed with the results of the Fiddler trace you can see the conversation that SharePoint is having with the content source that you are troubleshooting. The Fiddler web site has many resources for evaluating the results. Of course you could use this technique with other HTTP proxies, like WireShark, but I find Fiddler to be the easier to deploy and use in most scenarios.
Troubleshooting SharePoint Search can be a challenge. I hope that the techniques that I demonstrated here enable you to be more methodical in your efforts to determine why a crawl is failing and more efficiently find the resolution to the problem.

No comments:

Post a Comment