22
33## Introduction
44
5- The goal of this project was to develop a passive Google dork script to collect potentially vulnerable web pages and
6- applications on the Internet. There are 2 parts. The first is ` ghdb_scraper.py ` that retrieves Google Dorks and the
7- second portion is ` pagodo.py ` that leverages the information gathered by ` ghdb_scraper.py ` .
5+ pagodo automates Google searching for potentially vulnerable web pages and applications on the Internet. It replaces
6+ manually performing Google dork searches with a web GUI browser.
7+
8+ There are 2 parts. The first is ` ghdb_scraper.py ` that retrieves the latest Google dorks and the second portion is
9+ ` pagodo.py ` that leverages the information gathered by ` ghdb_scraper.py ` .
810
911HakByte created a video tutorial on using pagodo. It starts around 8 minutes in and you can find it here
1012< https://www.youtube.com/watch?v=lESeJ3EViCo&t=481s >
1113
12- ## What are Google Dorks ?
14+ ## What are Google dorks ?
1315
14- The awesome folks at Offensive Security maintain the Google Hacking Database (GHDB) found here:
16+ Offensive Security maintains the Google Hacking Database (GHDB) found here:
1517< https://www.exploit-db.com/google-hacking-database > . It is a collection of Google searches, called dorks, that can be
1618used to find potentially vulnerable boxes or other juicy info that is picked up by Google's search bots.
1719
@@ -27,64 +29,14 @@ source .venv/bin/activate # If using a virtual environment.
2729pip install -r requirements.txt
2830```
2931
30- ## Google is blocking me!
31-
32- If you start getting HTTP 429 errors, Google has rightfully detected you as a bot and will block your IP for a set
33- period of time. The solution is to use proxychains and a bank of proxies to round robin the lookups.
34-
35- Install proxychains4
36-
37- ``` bash
38- apt install proxychains4 -y
39- ```
40-
41- Edit the ` /etc/proxychains4.conf ` configuration file to round robin the look ups through different proxy servers. In
42- the example below, 2 different dynamic socks proxies have been set up with different local listening ports
43- (9050 and 9051). Don't know how to utilize SSH and dynamic socks proxies? Do yourself a favor and pick up a copy of
44- [ Cyber Plumber's Handbook and interactive lab] ( https://gumroad.com/l/cph_book_and_lab ) to learn all about Secure Shell
45- (SSH) tunneling, port redirection, and bending traffic like a boss.
46-
47- ``` bash
48- vim /etc/proxychains4.conf
49- ```
50-
51- ``` bash
52- round_robin
53- chain_len = 1
54- proxy_dns
55- remote_dns_subnet 224
56- tcp_read_time_out 15000
57- tcp_connect_time_out 8000
58- [ProxyList]
59- socks4 127.0.0.1 9050
60- socks4 127.0.0.1 9051
61- ```
62-
63- Throw ` proxychains4 ` in front of the Python script and each lookup will go through a different proxy (and thus source
64- from a different IP). You could even tune down the ` -e ` delay time because you will be leveraging different proxy boxes.
65-
66- ``` bash
67- proxychains4 python3 pagodo.py -g ALL_dorks.txt -s -e 17.0 -l 700 -j 1.1
68- ```
69-
7032## ghdb_scraper.py
7133
72- To start off, ` pagodo.py ` needs a list of all the current Google dorks. A datetimestamped file with the Google dorks
73- and the indididual dork category dorks are also provided in the repo. Fortunately, the entire database can be pulled
74- back with 1 GET request using ` ghdb_scraper.py ` . You can dump all dorks to a file, the individual dork categories to
75- separate dork files, or the entire json blob if you want more contextual data about the dork.
76-
77- To retrieve all dorks
78-
79- ``` bash
80- python3 ghdb_scraper.py -j -s
81- ```
34+ To start off, ` pagodo.py ` needs a list of all the current Google dorks. The repo contains a ` dorks/ ` directory with
35+ the current dorks when the ` ghdb_scraper.py ` was last run. It's advised to run ` ghdb_scraper.py ` to get the freshest
36+ data before running ` pagodo.py ` . The ` dorks/ ` directory contains:
8237
83- To retrieve all dorks and write them to individual categories:
84-
85- ``` bash
86- python3 ghdb_scraper.py -i
87- ```
38+ * the ` all_google_dorks.txt ` file which contains all the Google dorks
39+ * Individual dork category dorks
8840
8941Dork categories:
9042
@@ -107,6 +59,51 @@ categories = {
10759}
10860```
10961
62+ Fortunately, the entire database can be pulled back with 1 HTTP GET request using ` ghdb_scraper.py ` . You can dump all
63+ dorks to a file, the individual dork categories to separate dork files, or the entire json blob if you want more
64+ contextual data about each dork.
65+
66+ ### Using ghdb_scraper.py as a script
67+
68+ To retrieve all dorks:
69+
70+ ``` bash
71+ python ghdb_scraper.py -j -s
72+ ```
73+
74+ To retrieve all dorks and write them to individual categories:
75+
76+ ``` bash
77+ python ghdb_scraper.py -i
78+ ```
79+
80+ ### Using ghdb_scraper as a module
81+
82+ The ` ghdb_scraper.retrieve_google_dorks() ` returns a dictionary with the following data structure:
83+
84+ ``` python
85+ ghdb_dict = {
86+ " total_records" : total_records,
87+ " extracted_dorks" : extracted_dorks,
88+ " category_dict" : category_dict,
89+ }
90+ ```
91+
92+ Using a Python shell (like ` python ` or ` ipython ` ) to explore the data:
93+
94+ ``` python
95+ import ghdb_scraper
96+
97+ dorks = ghdb_scraper.retrieve_google_dorks(save_all_dorks_to_file = True )
98+ dorks.keys()
99+ dorks[" total_records" ]
100+
101+ dorks[" extracted_dorks" ]
102+
103+ dorks[" category_dict" ].keys()
104+
105+ dorks[" category_dict" ][1 ][" category_name" ]
106+ ```
110107
111108## pagodo.py
112109
@@ -155,6 +152,46 @@ To run it:
155152python3 pagodo.py -d example.com -g dorks.txt -l 50 -s -e 35.0 -j 1.1
156153```
157154
155+ ## Google is blocking me!
156+
157+ If you start getting HTTP 429 errors, Google has rightfully detected you as a bot and will block your IP for a set
158+ period of time. The solution is to use proxychains and a bank of proxies to round robin the lookups.
159+
160+ Install proxychains4
161+
162+ ``` bash
163+ apt install proxychains4 -y
164+ ```
165+
166+ Edit the ` /etc/proxychains4.conf ` configuration file to round robin the look ups through different proxy servers. In
167+ the example below, 2 different dynamic socks proxies have been set up with different local listening ports
168+ (9050 and 9051). Don't know how to utilize SSH and dynamic socks proxies? Do yourself a favor and pick up a copy of
169+ [ Cyber Plumber's Handbook and interactive lab] ( https://gumroad.com/l/cph_book_and_lab ) to learn all about Secure Shell
170+ (SSH) tunneling, port redirection, and bending traffic like a boss.
171+
172+ ``` bash
173+ vim /etc/proxychains4.conf
174+ ```
175+
176+ ``` bash
177+ round_robin
178+ chain_len = 1
179+ proxy_dns
180+ remote_dns_subnet 224
181+ tcp_read_time_out 15000
182+ tcp_connect_time_out 8000
183+ [ProxyList]
184+ socks4 127.0.0.1 9050
185+ socks4 127.0.0.1 9051
186+ ```
187+
188+ Throw ` proxychains4 ` in front of the Python script and each lookup will go through a different proxy (and thus source
189+ from a different IP). You could even tune down the ` -e ` delay time because you will be leveraging different proxy boxes.
190+
191+ ``` bash
192+ proxychains4 python3 pagodo.py -g ALL_dorks.txt -s -e 17.0 -l 700 -j 1.1
193+ ```
194+
158195## Conclusion
159196
160197Comments, suggestions, and improvements are always welcome. Be sure to follow [ @opsdisk ] ( https://twitter.com/opsdisk )
0 commit comments