Create a RobotsMatcher with the default matching strategy. The default
matching strategy is longest-match as opposed to the former internet draft
that provisioned first-match strategy. Analysis shows that longest-match,
while more restrictive for crawlers, is what webmasters assume when writing
directives. For example, in case of conflicting matches (both Allow and
Disallow), the longest match is the one the user wants. For example, in
case of a robots.txt file that has the following rules
Allow: /
Disallow: /cgi-bin
it's pretty obvious what the webmaster wants: they want to allow crawl of
every URI except /cgi-bin. However, according to the expired internet
standard, crawlers should be allowed to crawl everything with such a rule.
Create a RobotsMatcher with the default matching strategy. The default matching strategy is longest-match as opposed to the former internet draft that provisioned first-match strategy. Analysis shows that longest-match, while more restrictive for crawlers, is what webmasters assume when writing directives. For example, in case of conflicting matches (both Allow and Disallow), the longest match is the one the user wants. For example, in case of a robots.txt file that has the following rules Allow: / Disallow: /cgi-bin it's pretty obvious what the webmaster wants: they want to allow crawl of every URI except /cgi-bin. However, according to the expired internet standard, crawlers should be allowed to crawl everything with such a rule.