A powerful web crawler and word finder that searches websites for specific words or phrases and provides detailed location information.
- Single word search
- Dictionary-based search
- File-based word list search
- Multi-threaded crawling
- Export results to CSV and JSON
- Detailed scan reports
- Context-aware word finding
- Respects robots.txt and site load
- Smart encoding detection
- Progress tracking
- ASCII art interface
Required packages:
- requests
- beautifulsoup4
- Clone the repository:
Run the script:
python main.py
-
Single Word Search
- Choose option [1]
- Enter website URL
- Enter single word to search
- Set maximum pages to scan
-
Dictionary Search
- Choose option [2]
- Enter website URL
- Enter words one by one
- Press Enter twice to finish
- Set maximum pages to scan
-
File-based Search
- Choose option [3]
- Enter website URL
- Provide path to word list file
- Set maximum pages to scan
Create a text file with words to search (one per line):
word1
word2
word3
Results can be exported in:
- CSV format
- JSON format
- Both formats simultaneously
# Example word list (words.txt):
Python
programming
development
code
interpreter
variable
function
class
object
module
library
framework
# Example URL:
https://en.wikipedia.org/wiki/Python_(programming_language)
Word,URL,Location
Python,https://example.com,In tag <p>: Context of found word...
{
"word": {
"url": [
"In tag <p>: Context of found word..."
]
}
}
- Respects website robots.txt
- 0.5-second delay between requests
- Maximum 3 concurrent threads
- Default limit of 50 pages per scan
- Network error handling
- File reading error handling
- Invalid URL handling
- Encoding detection
- Timeout management
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a new Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- BeautifulSoup4 for HTML parsing
- Requests library for HTTP requests