unrobotstxt

Undocumented in source.

Modules

test
module unrobotstxt.test
Undocumented in source.
test_extra
module unrobotstxt.test_extra
Undocumented in source.

Members

Classes

LongestMatchRobotsMatchStrategy
class LongestMatchRobotsMatchStrategy
Undocumented in source.
RobotsMatchStrategy
class RobotsMatchStrategy

A RobotsMatchStrategy defines a strategy for matching individual lines in a robots.txt file. Each Match* method should return a match priority, which is interpreted as:

RobotsMatcher
class RobotsMatcher
Undocumented in source.
RobotsParseHandler
class RobotsParseHandler

Handler for directives found in robots.txt. These callbacks are called by ParseRobotsTxt() in the sequence they have been found in the file.

Functions

GetPathParamsQuery
string GetPathParamsQuery(string url)

Extracts path (with params) and query part from URL. Removes scheme, authority, and fragment. Result always starts with "/". Returns "/" if the url doesn't have a path or is not valid.

Matches
bool Matches(string path, string pattern)

Implements robots.txt pattern matching.

MaybeEscapePattern
string MaybeEscapePattern(string src)

Canonicalize the allowed/disallowed paths. For example: /SanJoséSellers ==> /Sanjos%C3%A9Sellers %aa ==> %AA

ParseRobotsTxt
void ParseRobotsTxt(string robots_body, RobotsParseHandler parse_callback)

Parses body of a robots.txt and emits parse callbacks. This will accept typical typos found in robots.txt, such as 'disalow'.

emitKeyValueToHandler
void emitKeyValueToHandler(int line, const(ParsedRobotsKey) key, string value, RobotsParseHandler handler)
Undocumented in source. Be warned that the author may not have intended to support it.
equalsIgnoreAsciiCase
bool equalsIgnoreAsciiCase(string s1, string s2)
Undocumented in source. Be warned that the author may not have intended to support it.
getKeyAndValueFrom
bool getKeyAndValueFrom(string key, string value, char[] line)
Undocumented in source. Be warned that the author may not have intended to support it.
startsWithIgnoreCase
bool startsWithIgnoreCase(string target, string prefix)
Undocumented in source. Be warned that the author may not have intended to support it.

Static functions

extractUserAgent
string extractUserAgent(string user_agent)

Extract the matchable part of a user agent string, essentially stopping at the first invalid character.

Structs

ParsedRobotsKey
struct ParsedRobotsKey

A robots.txt has lines of key/value pairs. A ParsedRobotsKey represents a key. This class can parse a text-representation (including common typos) and represent them as an enumeration which allows for faster processing afterwards. For unparsable keys, the original string representation is kept.

RobotsTxtParser
struct RobotsTxtParser
Undocumented in source.

Variables

kHexDigits
auto kHexDigits;
Undocumented in source.

Meta