Configure search crawl rules in SharePoint server
We can add crawl rule to include or exclude the content in SharePoint search. When we include a path we have to provide the account to crawl the data from that URL. We can use the crawl rules as to prevent the content from a subdirectory in a site; we can create a crawl rule to exclude the content from the subdirectory. We can specify the credentials other than default access account when creating the crawl rule.
To create or edit the crawl rule in SharePoint,
Navigate to Application Management in Central Administration and click on Manage Service Applications.
On the Manage Service Applications page, in list of service applications, click on the search service application to create crawl rule.
On the Search Administration page, crawling section, click on Crawl Rules link.
Click on New Crawl Rule link to create new crawl rule.
We will navigate into Add Crawl Rule page. In Add Crawl Rule page path box, enter the path that to apply the crawl rule. If we want to use regular expression syntax instead of wildcards for matching this rule, select Use regular expression syntax for matching this rule.
In the Crawl Configuration section, Selecting Exclude all items in this path, will excludes all the items in the specified path from the crawls. We can exclude the complex urls like containing (?) question mark notations by selecting Exclude complex URLs (URLs that contain question marks (?)). Bu selecting Include all items in this path that we want to crawl all the items in the specified path. We can filter the inclusion by specifying the options. By selecting Follow links on the URL without crawling the URL itself option, to crawl the links with n the specified Url. To crawl URLs that contains parameters that with question mark (?) notation, we have to select Crawl complex URLs (URLs that contain a question mark (?)). By selecting Crawl SharePoint content as http pages option will crawl the SharePoint sites as HTTP pages. By crawling HTTP protocol item permissions will not be stored.
In the specify authentication section, we have to specify the account to crawl the data. Here by selecting Use the default content access account will use the default content.
To specify the different account other than default access account, we have to select Specify a different content access account. In the Account box, need to enter the account name. In Password and Confirm Password boxes enter the password string. Select Do not allow Basic Authentication to skip the basic authentication. By default server tries to use NTLM authentication. If NTLM authentication fails, then server attempts for basic authentication unless we select Do not allow basic Authentication check box.
We have to select Specify client certificate and select the certificate to use client certificate authentication.
To use form credentials for authentication, select Specify form credentials and enter form url in Form URL box and enter the credentials.
To use cookies, select Use cookie for crawling. To get the cookie from a website or server select obtain cookie from URL. To import cookie from local file system or any other file share we have to select Specify cookie for crawling.
To allow anonymous access for crawling select Anonymous access option. Click on "OK".
Once we are done with crawl rule creation, to test the crawl rule Navigate to search service application. In the Search Administration page, Crawling section, click on Crawl rules. Enter URL and click test to find out if it matches rule box, ender the Url and to test. Click on Test button. We can see the result of the test below the Type a URL and click test to find out if it matches a rule box.
No comments:
Post a Comment