Skip to content

Commit 1a3115c

Browse files
authored
Merge pull request #57 from opsdisk/ghdb_scapery.py-updates
ghdb_srcaper.py updates
2 parents 8e0b8ff + 75eefc6 commit 1a3115c

File tree

8 files changed

+202
-116
lines changed

8 files changed

+202
-116
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
.venv/
2+
__pycache__/

README.md

Lines changed: 97 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,18 @@
22

33
## Introduction
44

5-
The goal of this project was to develop a passive Google dork script to collect potentially vulnerable web pages and
6-
applications on the Internet. There are 2 parts. The first is `ghdb_scraper.py` that retrieves Google Dorks and the
7-
second portion is `pagodo.py` that leverages the information gathered by `ghdb_scraper.py`.
5+
pagodo automates Google searching for potentially vulnerable web pages and applications on the Internet. It replaces
6+
manually performing Google dork searches with a web GUI browser.
7+
8+
There are 2 parts. The first is `ghdb_scraper.py` that retrieves the latest Google dorks and the second portion is
9+
`pagodo.py` that leverages the information gathered by `ghdb_scraper.py`.
810

911
HakByte created a video tutorial on using pagodo. It starts around 8 minutes in and you can find it here
1012
<https://www.youtube.com/watch?v=lESeJ3EViCo&t=481s>
1113

12-
## What are Google Dorks?
14+
## What are Google dorks?
1315

14-
The awesome folks at Offensive Security maintain the Google Hacking Database (GHDB) found here:
16+
Offensive Security maintains the Google Hacking Database (GHDB) found here:
1517
<https://www.exploit-db.com/google-hacking-database>. It is a collection of Google searches, called dorks, that can be
1618
used to find potentially vulnerable boxes or other juicy info that is picked up by Google's search bots.
1719

@@ -27,64 +29,14 @@ source .venv/bin/activate # If using a virtual environment.
2729
pip install -r requirements.txt
2830
```
2931

30-
## Google is blocking me!
31-
32-
If you start getting HTTP 429 errors, Google has rightfully detected you as a bot and will block your IP for a set
33-
period of time. The solution is to use proxychains and a bank of proxies to round robin the lookups.
34-
35-
Install proxychains4
36-
37-
```bash
38-
apt install proxychains4 -y
39-
```
40-
41-
Edit the `/etc/proxychains4.conf` configuration file to round robin the look ups through different proxy servers. In
42-
the example below, 2 different dynamic socks proxies have been set up with different local listening ports
43-
(9050 and 9051). Don't know how to utilize SSH and dynamic socks proxies? Do yourself a favor and pick up a copy of
44-
[Cyber Plumber's Handbook and interactive lab](https://gumroad.com/l/cph_book_and_lab) to learn all about Secure Shell
45-
(SSH) tunneling, port redirection, and bending traffic like a boss.
46-
47-
```bash
48-
vim /etc/proxychains4.conf
49-
```
50-
51-
```bash
52-
round_robin
53-
chain_len = 1
54-
proxy_dns
55-
remote_dns_subnet 224
56-
tcp_read_time_out 15000
57-
tcp_connect_time_out 8000
58-
[ProxyList]
59-
socks4 127.0.0.1 9050
60-
socks4 127.0.0.1 9051
61-
```
62-
63-
Throw `proxychains4` in front of the Python script and each lookup will go through a different proxy (and thus source
64-
from a different IP). You could even tune down the `-e` delay time because you will be leveraging different proxy boxes.
65-
66-
```bash
67-
proxychains4 python3 pagodo.py -g ALL_dorks.txt -s -e 17.0 -l 700 -j 1.1
68-
```
69-
7032
## ghdb_scraper.py
7133

72-
To start off, `pagodo.py` needs a list of all the current Google dorks. A datetimestamped file with the Google dorks
73-
and the indididual dork category dorks are also provided in the repo. Fortunately, the entire database can be pulled
74-
back with 1 GET request using `ghdb_scraper.py`. You can dump all dorks to a file, the individual dork categories to
75-
separate dork files, or the entire json blob if you want more contextual data about the dork.
76-
77-
To retrieve all dorks
78-
79-
```bash
80-
python3 ghdb_scraper.py -j -s
81-
```
34+
To start off, `pagodo.py` needs a list of all the current Google dorks. The repo contains a `dorks/` directory with
35+
the current dorks when the `ghdb_scraper.py` was last run. It's advised to run `ghdb_scraper.py` to get the freshest
36+
data before running `pagodo.py`. The `dorks/` directory contains:
8237

83-
To retrieve all dorks and write them to individual categories:
84-
85-
```bash
86-
python3 ghdb_scraper.py -i
87-
```
38+
* the `all_google_dorks.txt` file which contains all the Google dorks
39+
* Individual dork category dorks
8840

8941
Dork categories:
9042

@@ -107,6 +59,51 @@ categories = {
10759
}
10860
```
10961

62+
Fortunately, the entire database can be pulled back with 1 HTTP GET request using `ghdb_scraper.py`. You can dump all
63+
dorks to a file, the individual dork categories to separate dork files, or the entire json blob if you want more
64+
contextual data about each dork.
65+
66+
### Using ghdb_scraper.py as a script
67+
68+
To retrieve all dorks:
69+
70+
```bash
71+
python ghdb_scraper.py -j -s
72+
```
73+
74+
To retrieve all dorks and write them to individual categories:
75+
76+
```bash
77+
python ghdb_scraper.py -i
78+
```
79+
80+
### Using ghdb_scraper as a module
81+
82+
The `ghdb_scraper.retrieve_google_dorks()` returns a dictionary with the following data structure:
83+
84+
```python
85+
ghdb_dict = {
86+
"total_records": total_records,
87+
"extracted_dorks": extracted_dorks,
88+
"category_dict": category_dict,
89+
}
90+
```
91+
92+
Using a Python shell (like `python` or `ipython`) to explore the data:
93+
94+
```python
95+
import ghdb_scraper
96+
97+
dorks = ghdb_scraper.retrieve_google_dorks(save_all_dorks_to_file=True)
98+
dorks.keys()
99+
dorks["total_records"]
100+
101+
dorks["extracted_dorks"]
102+
103+
dorks["category_dict"].keys()
104+
105+
dorks["category_dict"][1]["category_name"]
106+
```
110107

111108
## pagodo.py
112109

@@ -155,6 +152,46 @@ To run it:
155152
python3 pagodo.py -d example.com -g dorks.txt -l 50 -s -e 35.0 -j 1.1
156153
```
157154

155+
## Google is blocking me!
156+
157+
If you start getting HTTP 429 errors, Google has rightfully detected you as a bot and will block your IP for a set
158+
period of time. The solution is to use proxychains and a bank of proxies to round robin the lookups.
159+
160+
Install proxychains4
161+
162+
```bash
163+
apt install proxychains4 -y
164+
```
165+
166+
Edit the `/etc/proxychains4.conf` configuration file to round robin the look ups through different proxy servers. In
167+
the example below, 2 different dynamic socks proxies have been set up with different local listening ports
168+
(9050 and 9051). Don't know how to utilize SSH and dynamic socks proxies? Do yourself a favor and pick up a copy of
169+
[Cyber Plumber's Handbook and interactive lab](https://gumroad.com/l/cph_book_and_lab) to learn all about Secure Shell
170+
(SSH) tunneling, port redirection, and bending traffic like a boss.
171+
172+
```bash
173+
vim /etc/proxychains4.conf
174+
```
175+
176+
```bash
177+
round_robin
178+
chain_len = 1
179+
proxy_dns
180+
remote_dns_subnet 224
181+
tcp_read_time_out 15000
182+
tcp_connect_time_out 8000
183+
[ProxyList]
184+
socks4 127.0.0.1 9050
185+
socks4 127.0.0.1 9051
186+
```
187+
188+
Throw `proxychains4` in front of the Python script and each lookup will go through a different proxy (and thus source
189+
from a different IP). You could even tune down the `-e` delay time because you will be leveraging different proxy boxes.
190+
191+
```bash
192+
proxychains4 python3 pagodo.py -g ALL_dorks.txt -s -e 17.0 -l 700 -j 1.1
193+
```
194+
158195
## Conclusion
159196

160197
Comments, suggestions, and improvements are always welcome. Be sure to follow [@opsdisk](https://twitter.com/opsdisk)

dorks/google_dorks.json renamed to dorks/all_google_dorks.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

dorks/all_google_dorks_20210814_145340.txt renamed to dorks/all_google_dorks.txt

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6553,3 +6553,26 @@ intitle:"Grandstream Device Configuration" (intext:password & intext:"Grandstrea
65536553
intitle:"index of" "contacts.txt"
65546554
inurl:/inicis/ ext:log
65556555
intext:"-----BEGIN CERTIFICATE-----" ext:txt
6556+
intitle:"3G wireless gateway" "login" intext:"huawei technologies"
6557+
intitle:"lg smart ip device" -.com
6558+
intitle:"7100 login" "lancom"
6559+
intitle:"ADB Broadband" login intext:"ADB Broadband S.p.A" -.com
6560+
intitle:"MediaAccess Gateway - Login" "access your MediaAccess Gateway"
6561+
intitle:"ADMINISTRATOR LOGIN" inurl:adminlogin
6562+
intitle:"geovision inc." inurl:login.htm
6563+
intitle:"KNX-IP-Gateway Login"
6564+
intitle:"DGS-3100 Login"
6565+
allintext:Welcome to the LabTech Web Portal
6566+
intitle:"Vue Element Admin" intext:"Username : admin" OR intext:"Username : editor" OR intext:"Password : any"
6567+
intitle:"web admin login" "Huawei Technologies"
6568+
intitle:"Login - Hitron technologies"
6569+
intitle:"Video web server" "login"
6570+
intitle:"vigor login page"
6571+
inurl:prweb/PRAuth
6572+
inurl:/multi.html intitle:webcam
6573+
intext:"developed and maintained by Netgate" intitle:login
6574+
intitle:"web server login" intext:"site ip"
6575+
intitle:"system login" "Drake Holdings"
6576+
inurl:mailscanner intitle:"mailwatch login page"
6577+
inurl:device_status.html "login"
6578+
inurl:/hp/device/SignIn/

dorks/pages_containing_login_portals.dorks

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1169,3 +1169,21 @@ inurl:/UserLogin intitle:"::PayTV SMS::" "Aplomb Technology"
11691169
intext:"SGP" inurl:/accounts/login?next=/admin/
11701170
inurl:"/tips/tipsLogin.action"
11711171
intitle:"Grandstream Device Configuration" (intext:password & intext:"Grandstream Device Configuration" & intext:"Grandstream Networks" | inurl:cgi-bin) -.com|org
1172+
intitle:"3G wireless gateway" "login" intext:"huawei technologies"
1173+
intitle:"ADB Broadband" login intext:"ADB Broadband S.p.A" -.com
1174+
intitle:"MediaAccess Gateway - Login" "access your MediaAccess Gateway"
1175+
intitle:"ADMINISTRATOR LOGIN" inurl:adminlogin
1176+
intitle:"geovision inc." inurl:login.htm
1177+
intitle:"KNX-IP-Gateway Login"
1178+
intitle:"DGS-3100 Login"
1179+
allintext:Welcome to the LabTech Web Portal
1180+
intitle:"Vue Element Admin" intext:"Username : admin" OR intext:"Username : editor" OR intext:"Password : any"
1181+
intitle:"web admin login" "Huawei Technologies"
1182+
intitle:"Login - Hitron technologies"
1183+
intitle:"Video web server" "login"
1184+
intitle:"vigor login page"
1185+
inurl:prweb/PRAuth
1186+
intext:"developed and maintained by Netgate" intitle:login
1187+
intitle:"system login" "Drake Holdings"
1188+
inurl:mailscanner intitle:"mailwatch login page"
1189+
inurl:device_status.html "login"

dorks/various_online_devices.dorks

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -658,3 +658,7 @@ inurl:top.cgi intitle:"Motorola ptp"
658658
intitle:"vood Residential gateway" inurl:vood/cgi-bin/
659659
intext:"Egardia & WoonVeilig" -site:"linkedin.*" -"data-lead.com" -"getemail.io" -"holaconnect.com" -"kzhead.info"
660660
intext:"Live View" inurl:ui3.htm
661+
intitle:"lg smart ip device" -.com
662+
intitle:"7100 login" "lancom"
663+
inurl:/multi.html intitle:webcam
664+
inurl:/hp/device/SignIn/

dorks/web_server_detection.dorks

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,3 +183,4 @@ intitle:"Icecast Streaming Media Server"
183183
intitle:"Welcome to WildFly" intext:"Administration Console"
184184
intitle:"Index of" site:.gov intext:"Server at"
185185
intitle:"Welcome" intext:"LiteSpeed Technologies, Inc. All Rights Reserved."
186+
intitle:"web server login" intext:"site ip"

0 commit comments

Comments
 (0)