Skip to content

Commit 33c4c0f

Browse files
authored
Merge pull request #117 from opsdisk/20250830-update-python-libs
Bumped Python lib versions
2 parents 47bdd38 + 1d6ab81 commit 33c4c0f

File tree

4 files changed

+42
-41
lines changed

4 files changed

+42
-41
lines changed

README.md

Lines changed: 38 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -2,64 +2,65 @@
22

33
## Introduction
44

5-
`pagodo` automates Google searching for potentially vulnerable web pages and applications on the Internet. It replaces
5+
`pagodo` automates Google searching for potentially vulnerable web pages and applications on the Internet. It replaces
66
manually performing Google dork searches with a web GUI browser.
77

8-
There are 2 parts. The first is `ghdb_scraper.py` that retrieves the latest Google dorks and the second portion is
8+
There are 2 parts. The first is `ghdb_scraper.py` that retrieves the latest Google dorks and the second portion is
99
`pagodo.py` that leverages the information gathered by `ghdb_scraper.py`.
1010

1111
The core Google search library now uses the more flexible [yagooglesearch](https://github.com/opsdisk/yagooglesearch)
12-
instead of [googlesearch](https://github.com/MarioVilas/googlesearch). Check out the [yagooglesearch
12+
instead of [googlesearch](https://github.com/MarioVilas/googlesearch). Check out the [yagooglesearch
1313
README](https://github.com/opsdisk/yagooglesearch/blob/master/README.md) for a more in-depth explanation of the library
1414
differences and capabilities.
1515

1616
This version of `pagodo` also supports native HTTP(S) and SOCKS5 application support, so no more wrapping it in a tool
17-
like `proxychains4` if you need proxy support. You can specify multiple proxies to use in a round-robin fashion by
17+
like `proxychains4` if you need proxy support. You can specify multiple proxies to use in a round-robin fashion by
1818
providing a comma separated string of proxies using the `-p` switch.
1919

2020
## What are Google dorks?
2121

2222
Offensive Security maintains the Google Hacking Database (GHDB) found here:
23-
<https://www.exploit-db.com/google-hacking-database>. It is a collection of Google searches, called dorks, that can be
23+
<https://www.exploit-db.com/google-hacking-database>. It is a collection of Google searches, called dorks, that can be
2424
used to find potentially vulnerable boxes or other juicy info that is picked up by Google's search bots.
2525

2626
## Terms and Conditions
2727

2828
The terms and conditions for `pagodo` are the same terms and conditions found in
2929
[yagooglesearch](https://github.com/opsdisk/yagooglesearch#terms-and-conditions).
3030

31-
This code is supplied as-is and you are fully responsible for how it is used. Scraping Google Search results may
32-
violate their [Terms of Service](https://policies.google.com/terms). Another Python Google search library had some
31+
This code is supplied as-is and you are fully responsible for how it is used. Scraping Google Search results may
32+
violate their [Terms of Service](https://policies.google.com/terms). Another Python Google search library had some
3333
interesting information/discussion on it:
3434

35-
* [Original issue](https://github.com/aviaryan/python-gsearch/issues/1)
36-
* [A response](https://github.com/aviaryan/python-gsearch/issues/1#issuecomment-365581431>)
37-
* Author created a separate [Terms and Conditions](https://github.com/aviaryan/python-gsearch/blob/master/T_AND_C.md)
38-
* ...that contained link to this [blog](https://benbernardblog.com/web-scraping-and-crawling-are-perfectly-legal-right/)
35+
- [Original issue](https://github.com/aviaryan/python-gsearch/issues/1)
36+
- [A response](https://github.com/aviaryan/python-gsearch/issues/1#issuecomment-365581431>)
37+
- Author created a separate [Terms and Conditions](https://github.com/aviaryan/python-gsearch/blob/master/T_AND_C.md)
38+
- ...that contained link to this [blog](https://benbernardblog.com/web-scraping-and-crawling-are-perfectly-legal-right/)
3939

4040
Google's preferred method is to use their [API](https://developers.google.com/custom-search/v1/overview).
4141

4242
## Installation
4343

44-
Scripts are written for Python 3.6+. Clone the git repository and install the requirements.
44+
Scripts are written for Python 3.6+. Clone the git repository and install the requirements.
4545

4646
```bash
4747
git clone https://github.com/opsdisk/pagodo.git
4848
cd pagodo
4949
python3 -m venv .venv # If using a virtual environment.
5050
source .venv/bin/activate # If using a virtual environment.
51+
pip install --upgrade pip setuptools
5152
pip install -r requirements.txt
5253
```
5354

5455
## ghdb_scraper.py
5556

56-
To start off, `pagodo.py` needs a list of all the current Google dorks. The repo contains a `dorks/` directory with the
57+
To start off, `pagodo.py` needs a list of all the current Google dorks. The repo contains a `dorks/` directory with the
5758
current dorks when the `ghdb_scraper.py` was last run. It's advised to run `ghdb_scraper.py` to get the freshest data
58-
before running `pagodo.py`. The `dorks/` directory contains:
59+
before running `pagodo.py`. The `dorks/` directory contains:
5960

60-
* the `all_google_dorks.txt` file which contains all the Google dorks, one per line
61-
* the `all_google_dorks.json` file which is the JSON response from GHDB
62-
* Individual category dorks
61+
- the `all_google_dorks.txt` file which contains all the Google dorks, one per line
62+
- the `all_google_dorks.json` file which is the JSON response from GHDB
63+
- Individual category dorks
6364

6465
Dork categories:
6566

@@ -119,12 +120,12 @@ dorks["category_dict"].keys()
119120
dorks["category_dict"][1]["category_name"]
120121
```
121122

122-
## <span>pagodo.py</span>
123+
## pagodo.py
123124

124-
### Using <span>pagodo.py</span> as a script
125+
### Using pagodo.py as a script
125126

126127
```bash
127-
python pagodo.py -d example.com -g dorks.txt
128+
python pagodo.py -d example.com -g dorks.txt
128129
```
129130

130131
### Using pagodo as a module
@@ -195,37 +196,37 @@ site:github.com
195196

196197
### Wait time between Google dork searchers
197198

198-
* `-i` - Specify the **minimum** delay between dork searches, in seconds. Don't make this too small, or your IP will
199-
get HTTP 429'd quickly.
200-
* `-x` - Specify the **maximum** delay between dork searches, in seconds. Don't make this too big or the searches will
201-
take a long time.
199+
- `-i` - Specify the **minimum** delay between dork searches, in seconds. Don't make this too small, or your IP will
200+
get HTTP 429'd quickly.
201+
- `-x` - Specify the **maximum** delay between dork searches, in seconds. Don't make this too big or the searches will
202+
take a long time.
202203

203204
The values provided by `-i` and `-x` are used to generate a list of 20 randomly wait times, that are randomly selected
204205
between each different Google dork search.
205206

206207
### Number of results to return
207208

208-
`-m` - The total max search results to return per Google dork. Each Google search request can pull back at most 100
209+
`-m` - The total max search results to return per Google dork. Each Google search request can pull back at most 100
209210
results at a time, so if you pick `-m 500`, 5 separate search queries will have to be made for each Google dork search,
210211
which will increase the amount of time to complete.
211212

212213
### Save Output
213214

214-
`-o [optional/path/to/results.json]` - Save output to a JSON file. If you do not specify a filename, a datetimestamped
215+
`-o [optional/path/to/results.json]` - Save output to a JSON file. If you do not specify a filename, a datetimestamped
215216
one will be generated.
216217

217-
`-s [optional/path/to/results.txt]` - Save URLs to a text file. If you do not specify a filename, a datetimestamped one
218+
`-s [optional/path/to/results.txt]` - Save URLs to a text file. If you do not specify a filename, a datetimestamped one
218219
will be generated.
219220

220221
### Save logs
221222

222-
`--log [optional/path/to/file.log]` - Save logs to the specified file. If you do not specify a filename, the default
223+
`--log [optional/path/to/file.log]` - Save logs to the specified file. If you do not specify a filename, the default
223224
file `pagodo.py.log` at the root of pagodo directory will be used.
224225

225226
## Google is blocking me!
226227

227-
Performing 7300+ search requests to Google as fast as possible will simply not work. Google will rightfully detect it
228-
as a bot and block your IP for a set period of time. One solution is to use a bank of HTTP(S)/SOCKS proxies and pass
228+
Performing 7300+ search requests to Google as fast as possible will simply not work. Google will rightfully detect it
229+
as a bot and block your IP for a set period of time. One solution is to use a bank of HTTP(S)/SOCKS proxies and pass
229230
them to `pagodo`
230231

231232
### Native proxy support
@@ -236,7 +237,7 @@ Pass a comma separated string of proxies to `pagodo` using the `-p` switch.
236237
python pagodo.py -g dorks.txt -p http://myproxy:8080,socks5h://127.0.0.1:9050,socks5h://127.0.0.1:9051
237238
```
238239

239-
You could even decrease the `-i` and `-x` values because you will be leveraging different proxy IPs. The proxies passed
240+
You could even decrease the `-i` and `-x` values because you will be leveraging different proxy IPs. The proxies passed
240241
to `pagodo` are selected by round robin.
241242

242243
### proxychains4 support
@@ -249,7 +250,7 @@ Install `proxychains4`
249250
apt install proxychains4 -y
250251
```
251252

252-
Edit the `/etc/proxychains4.conf` configuration file to round robin the look ups through different proxy servers. In
253+
Edit the `/etc/proxychains4.conf` configuration file to round robin the look ups through different proxy servers. In
253254
the example below, 2 different dynamic socks proxies have been set up with different local listening ports (9050 and
254255
9051).
255256

@@ -269,7 +270,7 @@ socks4 127.0.0.1 9050
269270
socks4 127.0.0.1 9051
270271
```
271272

272-
Throw `proxychains4` in front of the `pagodo.py` script and each *request* lookup will go through a different proxy (and
273+
Throw `proxychains4` in front of the `pagodo.py` script and each _request_ lookup will go through a different proxy (and
273274
thus source from a different IP).
274275

275276
```bash
@@ -278,10 +279,10 @@ proxychains4 python pagodo.py -g dorks/all_google_dorks.txt -o [optional/path/to
278279

279280
Note that this may not appear natural to Google if you:
280281

281-
1) Simulate "browsing" to `google.com` from IP #1
282-
2) Make the first search query from IP #2
283-
3) Simulate clicking "Next" to make the second search query from IP #3
284-
4) Simulate clicking "Next to make the third search query from IP #1
282+
1. Simulate "browsing" to `google.com` from IP #1
283+
2. Make the first search query from IP #2
284+
3. Simulate clicking "Next" to make the second search query from IP #3
285+
4. Simulate clicking "Next to make the third search query from IP #1
285286

286287
For that reason, using the built in `-p` proxy support is preferred because, as stated in the `yagooglesearch`
287288
documentation, the "provided proxy is used for the entire life cycle of the search to make it look more human, instead

ghdb_scraper.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# Custom Python libraries.
1414

1515

16-
__version__ = "1.2.1"
16+
__version__ = "1.3.0"
1717

1818

1919
"""

pagodo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
# Custom Python libraries.
2020

2121

22-
__version__ = "2.6.4"
22+
__version__ = "2.7.0"
2323

2424

2525
class Pagodo:

requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
beautifulsoup4==4.13.4
2-
requests==2.32.3
1+
beautifulsoup4==4.13.5
2+
requests==2.32.5
33
yagooglesearch==1.10.0

0 commit comments

Comments
 (0)