A RobotsMatchStrategy defines a strategy for matching individual lines in a robots.txt file. Each Match* method should return a match priority, which is interpreted as:
Handler for directives found in robots.txt. These callbacks are called by ParseRobotsTxt() in the sequence they have been found in the file.
Extracts path (with params) and query part from URL. Removes scheme, authority, and fragment. Result always starts with "/". Returns "/" if the url doesn't have a path or is not valid.
Implements robots.txt pattern matching.
Canonicalize the allowed/disallowed paths. For example: /SanJoséSellers ==> /Sanjos%C3%A9Sellers %aa ==> %AA
Parses body of a robots.txt and emits parse callbacks. This will accept typical typos found in robots.txt, such as 'disalow'.
Extract the matchable part of a user agent string, essentially stopping at the first invalid character.
A robots.txt has lines of key/value pairs. A ParsedRobotsKey represents a key. This class can parse a text-representation (including common typos) and represent them as an enumeration which allows for faster processing afterwards. For unparsable keys, the original string representation is kept.