22
33## Introduction
44
5- ` pagodo ` automates Google searching for potentially vulnerable web pages and applications on the Internet. It replaces
5+ ` pagodo ` automates Google searching for potentially vulnerable web pages and applications on the Internet. It replaces
66manually performing Google dork searches with a web GUI browser.
77
8- There are 2 parts. The first is ` ghdb_scraper.py ` that retrieves the latest Google dorks and the second portion is
8+ There are 2 parts. The first is ` ghdb_scraper.py ` that retrieves the latest Google dorks and the second portion is
99` pagodo.py ` that leverages the information gathered by ` ghdb_scraper.py ` .
1010
1111The core Google search library now uses the more flexible [ yagooglesearch] ( https://github.com/opsdisk/yagooglesearch )
12- instead of [ googlesearch] ( https://github.com/MarioVilas/googlesearch ) . Check out the [ yagooglesearch
12+ instead of [ googlesearch] ( https://github.com/MarioVilas/googlesearch ) . Check out the [ yagooglesearch
1313README] ( https://github.com/opsdisk/yagooglesearch/blob/master/README.md ) for a more in-depth explanation of the library
1414differences and capabilities.
1515
1616This version of ` pagodo ` also supports native HTTP(S) and SOCKS5 application support, so no more wrapping it in a tool
17- like ` proxychains4 ` if you need proxy support. You can specify multiple proxies to use in a round-robin fashion by
17+ like ` proxychains4 ` if you need proxy support. You can specify multiple proxies to use in a round-robin fashion by
1818providing a comma separated string of proxies using the ` -p ` switch.
1919
2020## What are Google dorks?
2121
2222Offensive Security maintains the Google Hacking Database (GHDB) found here:
23- < https://www.exploit-db.com/google-hacking-database > . It is a collection of Google searches, called dorks, that can be
23+ < https://www.exploit-db.com/google-hacking-database > . It is a collection of Google searches, called dorks, that can be
2424used to find potentially vulnerable boxes or other juicy info that is picked up by Google's search bots.
2525
2626## Terms and Conditions
2727
2828The terms and conditions for ` pagodo ` are the same terms and conditions found in
2929[ yagooglesearch] ( https://github.com/opsdisk/yagooglesearch#terms-and-conditions ) .
3030
31- This code is supplied as-is and you are fully responsible for how it is used. Scraping Google Search results may
32- violate their [ Terms of Service] ( https://policies.google.com/terms ) . Another Python Google search library had some
31+ This code is supplied as-is and you are fully responsible for how it is used. Scraping Google Search results may
32+ violate their [ Terms of Service] ( https://policies.google.com/terms ) . Another Python Google search library had some
3333interesting information/discussion on it:
3434
35- * [ Original issue] ( https://github.com/aviaryan/python-gsearch/issues/1 )
36- * [ A response] ( https://github.com/aviaryan/python-gsearch/issues/1#issuecomment-365581431> )
37- * Author created a separate [ Terms and Conditions] ( https://github.com/aviaryan/python-gsearch/blob/master/T_AND_C.md )
38- * ...that contained link to this [ blog] ( https://benbernardblog.com/web-scraping-and-crawling-are-perfectly-legal-right/ )
35+ - [ Original issue] ( https://github.com/aviaryan/python-gsearch/issues/1 )
36+ - [ A response] ( https://github.com/aviaryan/python-gsearch/issues/1#issuecomment-365581431> )
37+ - Author created a separate [ Terms and Conditions] ( https://github.com/aviaryan/python-gsearch/blob/master/T_AND_C.md )
38+ - ...that contained link to this [ blog] ( https://benbernardblog.com/web-scraping-and-crawling-are-perfectly-legal-right/ )
3939
4040Google's preferred method is to use their [ API] ( https://developers.google.com/custom-search/v1/overview ) .
4141
4242## Installation
4343
44- Scripts are written for Python 3.6+. Clone the git repository and install the requirements.
44+ Scripts are written for Python 3.6+. Clone the git repository and install the requirements.
4545
4646``` bash
4747git clone https://github.com/opsdisk/pagodo.git
4848cd pagodo
4949python3 -m venv .venv # If using a virtual environment.
5050source .venv/bin/activate # If using a virtual environment.
51+ pip install --upgrade pip setuptools
5152pip install -r requirements.txt
5253```
5354
5455## ghdb_scraper.py
5556
56- To start off, ` pagodo.py ` needs a list of all the current Google dorks. The repo contains a ` dorks/ ` directory with the
57+ To start off, ` pagodo.py ` needs a list of all the current Google dorks. The repo contains a ` dorks/ ` directory with the
5758current dorks when the ` ghdb_scraper.py ` was last run. It's advised to run ` ghdb_scraper.py ` to get the freshest data
58- before running ` pagodo.py ` . The ` dorks/ ` directory contains:
59+ before running ` pagodo.py ` . The ` dorks/ ` directory contains:
5960
60- * the ` all_google_dorks.txt ` file which contains all the Google dorks, one per line
61- * the ` all_google_dorks.json ` file which is the JSON response from GHDB
62- * Individual category dorks
61+ - the ` all_google_dorks.txt ` file which contains all the Google dorks, one per line
62+ - the ` all_google_dorks.json ` file which is the JSON response from GHDB
63+ - Individual category dorks
6364
6465Dork categories:
6566
@@ -119,12 +120,12 @@ dorks["category_dict"].keys()
119120dorks[" category_dict" ][1 ][" category_name" ]
120121```
121122
122- ## < span > pagodo.py</ span >
123+ ## pagodo.py
123124
124- ### Using < span > pagodo.py</ span > as a script
125+ ### Using pagodo.py as a script
125126
126127``` bash
127- python pagodo.py -d example.com -g dorks.txt
128+ python pagodo.py -d example.com -g dorks.txt
128129```
129130
130131### Using pagodo as a module
@@ -195,37 +196,37 @@ site:github.com
195196
196197### Wait time between Google dork searchers
197198
198- * ` -i ` - Specify the ** minimum** delay between dork searches, in seconds. Don't make this too small, or your IP will
199- get HTTP 429'd quickly.
200- * ` -x ` - Specify the ** maximum** delay between dork searches, in seconds. Don't make this too big or the searches will
201- take a long time.
199+ - ` -i ` - Specify the ** minimum** delay between dork searches, in seconds. Don't make this too small, or your IP will
200+ get HTTP 429'd quickly.
201+ - ` -x ` - Specify the ** maximum** delay between dork searches, in seconds. Don't make this too big or the searches will
202+ take a long time.
202203
203204The values provided by ` -i ` and ` -x ` are used to generate a list of 20 randomly wait times, that are randomly selected
204205between each different Google dork search.
205206
206207### Number of results to return
207208
208- ` -m ` - The total max search results to return per Google dork. Each Google search request can pull back at most 100
209+ ` -m ` - The total max search results to return per Google dork. Each Google search request can pull back at most 100
209210results at a time, so if you pick ` -m 500 ` , 5 separate search queries will have to be made for each Google dork search,
210211which will increase the amount of time to complete.
211212
212213### Save Output
213214
214- ` -o [optional/path/to/results.json] ` - Save output to a JSON file. If you do not specify a filename, a datetimestamped
215+ ` -o [optional/path/to/results.json] ` - Save output to a JSON file. If you do not specify a filename, a datetimestamped
215216one will be generated.
216217
217- ` -s [optional/path/to/results.txt] ` - Save URLs to a text file. If you do not specify a filename, a datetimestamped one
218+ ` -s [optional/path/to/results.txt] ` - Save URLs to a text file. If you do not specify a filename, a datetimestamped one
218219will be generated.
219220
220221### Save logs
221222
222- ` --log [optional/path/to/file.log] ` - Save logs to the specified file. If you do not specify a filename, the default
223+ ` --log [optional/path/to/file.log] ` - Save logs to the specified file. If you do not specify a filename, the default
223224file ` pagodo.py.log ` at the root of pagodo directory will be used.
224225
225226## Google is blocking me!
226227
227- Performing 7300+ search requests to Google as fast as possible will simply not work. Google will rightfully detect it
228- as a bot and block your IP for a set period of time. One solution is to use a bank of HTTP(S)/SOCKS proxies and pass
228+ Performing 7300+ search requests to Google as fast as possible will simply not work. Google will rightfully detect it
229+ as a bot and block your IP for a set period of time. One solution is to use a bank of HTTP(S)/SOCKS proxies and pass
229230them to ` pagodo `
230231
231232### Native proxy support
@@ -236,7 +237,7 @@ Pass a comma separated string of proxies to `pagodo` using the `-p` switch.
236237python pagodo.py -g dorks.txt -p http://myproxy:8080,socks5h://127.0.0.1:9050,socks5h://127.0.0.1:9051
237238```
238239
239- You could even decrease the ` -i ` and ` -x ` values because you will be leveraging different proxy IPs. The proxies passed
240+ You could even decrease the ` -i ` and ` -x ` values because you will be leveraging different proxy IPs. The proxies passed
240241to ` pagodo ` are selected by round robin.
241242
242243### proxychains4 support
@@ -249,7 +250,7 @@ Install `proxychains4`
249250apt install proxychains4 -y
250251```
251252
252- Edit the ` /etc/proxychains4.conf ` configuration file to round robin the look ups through different proxy servers. In
253+ Edit the ` /etc/proxychains4.conf ` configuration file to round robin the look ups through different proxy servers. In
253254the example below, 2 different dynamic socks proxies have been set up with different local listening ports (9050 and
2542559051).
255256
@@ -269,7 +270,7 @@ socks4 127.0.0.1 9050
269270socks4 127.0.0.1 9051
270271```
271272
272- Throw ` proxychains4 ` in front of the ` pagodo.py ` script and each * request * lookup will go through a different proxy (and
273+ Throw ` proxychains4 ` in front of the ` pagodo.py ` script and each _ request _ lookup will go through a different proxy (and
273274thus source from a different IP).
274275
275276``` bash
@@ -278,10 +279,10 @@ proxychains4 python pagodo.py -g dorks/all_google_dorks.txt -o [optional/path/to
278279
279280Note that this may not appear natural to Google if you:
280281
281- 1 ) Simulate "browsing" to ` google.com ` from IP #1
282- 2 ) Make the first search query from IP #2
283- 3 ) Simulate clicking "Next" to make the second search query from IP #3
284- 4 ) Simulate clicking "Next to make the third search query from IP #1
282+ 1 . Simulate "browsing" to ` google.com ` from IP #1
283+ 2 . Make the first search query from IP #2
284+ 3 . Simulate clicking "Next" to make the second search query from IP #3
285+ 4 . Simulate clicking "Next to make the third search query from IP #1
285286
286287For that reason, using the built in ` -p ` proxy support is preferred because, as stated in the ` yagooglesearch `
287288documentation, the "provided proxy is used for the entire life cycle of the search to make it look more human, instead
0 commit comments