unrobotstxt

Undocumented in source.

Modules

LongestMatchRobotsMatchStrategy class LongestMatchRobotsMatchStrategy: Undocumented in source.
RobotsMatchStrategy class RobotsMatchStrategy: A RobotsMatchStrategy defines a strategy for matching individual lines in a robots.txt file. Each Match* method should return a match priority, which is interpreted as:
RobotsMatcher class RobotsMatcher: Undocumented in source.
RobotsParseHandler class RobotsParseHandler: Handler for directives found in robots.txt. These callbacks are called by ParseRobotsTxt() in the sequence they have been found in the file.

GetPathParamsQuery string GetPathParamsQuery(string url): Extracts path (with params) and query part from URL. Removes scheme, authority, and fragment. Result always starts with "/". Returns "/" if the url doesn't have a path or is not valid.
Matches bool Matches(string path, string pattern): Implements robots.txt pattern matching.
MaybeEscapePattern string MaybeEscapePattern(string src): Canonicalize the allowed/disallowed paths. For example: /SanJoséSellers ==> /Sanjos%C3%A9Sellers %aa ==> %AA
ParseRobotsTxt void ParseRobotsTxt(string robots_body, RobotsParseHandler parse_callback): Parses body of a robots.txt and emits parse callbacks. This will accept typical typos found in robots.txt, such as 'disalow'.
emitKeyValueToHandler void emitKeyValueToHandler(int line, const(ParsedRobotsKey) key, string value, RobotsParseHandler handler): Undocumented in source. Be warned that the author may not have intended to support it.
equalsIgnoreAsciiCase bool equalsIgnoreAsciiCase(string s1, string s2): Undocumented in source. Be warned that the author may not have intended to support it.
getKeyAndValueFrom bool getKeyAndValueFrom(string key, string value, char[] line): Undocumented in source. Be warned that the author may not have intended to support it.
startsWithIgnoreCase bool startsWithIgnoreCase(string target, string prefix): Undocumented in source. Be warned that the author may not have intended to support it.

extractUserAgent string extractUserAgent(string user_agent): Extract the matchable part of a user agent string, essentially stopping at the first invalid character.

ParsedRobotsKey struct ParsedRobotsKey: A robots.txt has lines of key/value pairs. A ParsedRobotsKey represents a key. This class can parse a text-representation (including common typos) and represent them as an enumeration which allows for faster processing afterwards. For unparsable keys, the original string representation is kept.
RobotsTxtParser struct RobotsTxtParser: Undocumented in source.