The directory from which crawl configurations are fetched from is specified
using the command-line option -crawlconfigdir <path>.
In this directory there should be a file per hidden service that needs specific
configuration options. For example ab23cd45ef67gh76.onion.json; though the name of the file
is not parsed so any naming convention can be used.
Configuring the scan for a service does not automatically cause it to be
scanned. They still need to be specified explicitly, either on the command
line or in a -list file.
{
"onion": "aabbccddeeffgghh.onion",
"base": "/forums",
"exclude": [
"/profile",
"/settings"
],
"relationships":[
{
"name":"User",
"triggeridentifierregex":"index\\.php\\?action=profile;u=([0-9]*)",
"extrarelationships":[
{
"name":"Name",
"regex":"<div class=\"username\"><h4>(.*) <span class=\"position\">"
},
{
"name":"Position",
"regex":"<span class=\"position\">(.*)</span></h4>"
}
]
},
{
"name":"Post",
"triggeridentifierregex":"index\\.php\\?topic=([0-9]*)",
"extrarelationships":[
{
"name":"Topic",
"regex":"Topic: (.*) \\(Read"
}
]
}
]
}
The following configuration parameters can currently be specified:
-
onion: The hostname of the service to configure scanning for. This should be just the hostname, and have nohttp://prefix or path components. -
base: configures the base path, relative to the root of the site (to ignore all other parts of a site and focus on a specific set of URLs e.g./forums) -
exclude: tells the scanner to ignore URLs which contain one or more of the given strings - this allows explicitly ignoring uninteresting URLs (e.g./profileor/settings) and also for avoiding URLs which might mess up the scan (e.g./logout) -
relationships: configures OnionScan with custom relationships. Like many preconfigured relationships, these are specified by a trigger URL regular expression. Thetriggeridentifierregexmust specify 1 group that contains the Identifer of the relationships. This will be stored in OnionScan as a relationship mappingOnion->Identifierand given thefromattribute of the relationship name.For example, in the above structure, two relationships are defined
UserandPost.For
Userthe trigger regex isindex\.php\?action=profile;u=([0-9]*)which when found will store an identifier marking the users profile ID as the identifier.Useralso specifies two sub-relationshipsNameandPositionwhich are specified by different regular expressions that should be found on the same page as the one identified by the trigger URL. These will be stored by OnionScan asOnion->NameandOnion->Postionwith thefromattribute set to the originally capturedID.