Skip to content

Commit

Permalink
NUTCH-3083 Add RobotRulesParser to bin/nutch
Browse files Browse the repository at this point in the history
Add command *robotsparser* to bin/nutch, invoking the main method
of org.apache.nutch.protocol.RobotRulesParser
  • Loading branch information
sebastian-nagel committed Dec 4, 2024
1 parent 5263b7c commit b481f91
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions src/bin/nutch
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ if [ $# = 0 ]; then
echo " indexchecker check the indexing filters for a given url"
echo " filterchecker check url filters for a given url"
echo " normalizerchecker check url normalizers for a given url"
echo " robotsparser parse a robots.txt file and check whether urls are allowed or not"
echo " domainstats calculate domain statistics from crawldb"
echo " protocolstats calculate protocol status code stats from crawldb"
echo " crawlcomplete calculate crawl completion stats from crawldb"
Expand Down Expand Up @@ -268,6 +269,8 @@ elif [ "$COMMAND" = "filterchecker" ] ; then
CLASS=org.apache.nutch.net.URLFilterChecker
elif [ "$COMMAND" = "normalizerchecker" ] ; then
CLASS=org.apache.nutch.net.URLNormalizerChecker
elif [ "$COMMAND" = "robotsparser" ] ; then
CLASS=org.apache.nutch.protocol.RobotRulesParser
elif [ "$COMMAND" = "domainstats" ] ; then
CLASS=org.apache.nutch.util.DomainStatistics
elif [ "$COMMAND" = "protocolstats" ] ; then
Expand Down

0 comments on commit b481f91

Please sign in to comment.