Site-Parser# Site Parser

A powerful web crawler and word finder that searches websites for specific words or phrases and provides detailed location information.

Features

Single word search
Dictionary-based search
File-based word list search
Multi-threaded crawling
Export results to CSV and JSON
Detailed scan reports
Context-aware word finding
Respects robots.txt and site load
Smart encoding detection
Progress tracking
ASCII art interface

Requirements

Required packages:

requests
beautifulsoup4

Installation

Clone the repository:

Usage

Run the script:

python main.py

Search Modes

Single Word Search
- Choose option [1]
- Enter website URL
- Enter single word to search
- Set maximum pages to scan
Dictionary Search
- Choose option [2]
- Enter website URL
- Enter words one by one
- Press Enter twice to finish
- Set maximum pages to scan
File-based Search
- Choose option [3]
- Enter website URL
- Provide path to word list file
- Set maximum pages to scan

Word List File Format

Create a text file with words to search (one per line):

word1
word2
word3

Export Options

Results can be exported in:

CSV format
JSON format
Both formats simultaneously

Example

# Example word list (words.txt):
Python
programming
development
code
interpreter
variable
function
class
object
module
library
framework

# Example URL:
https://en.wikipedia.org/wiki/Python_(programming_language)

Output Format

CSV Output

Word,URL,Location
Python,https://example.com,In tag <p>: Context of found word...

JSON Output

{
    "word": {
        "url": [
            "In tag <p>: Context of found word..."
        ]
    }
}

Limitations

Respects website robots.txt
0.5-second delay between requests
Maximum 3 concurrent threads
Default limit of 50 pages per scan

Error Handling

Network error handling
File reading error handling
Invalid URL handling
Encoding detection
Timeout management

Contributing

Fork the repository
Create your feature branch
Commit your changes
Push to the branch
Create a new Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

BeautifulSoup4 for HTML parsing
Requests library for HTTP requests

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
dict_to_parse.txt		dict_to_parse.txt
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Site-Parser# Site Parser

Features

Requirements

Installation

Usage

Search Modes

Word List File Format

Export Options

Example

Output Format

CSV Output

JSON Output

Limitations

Error Handling

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

License

anonymmized/Site-Parser

Folders and files

Latest commit

History

Repository files navigation

Site-Parser# Site Parser

Features

Requirements

Installation

Usage

Search Modes

Word List File Format

Export Options

Example

Output Format

CSV Output

JSON Output

Limitations

Error Handling

Contributing

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages