This project is a distributed web crawler, which is specially developed for crawling data from Quora.com.
Server message queuing middleware, using rabbitmq or redis.
Using virtual environments is not recommended, as it makes the task impossible.
| Dependency | Version |
|---|---|
| Python | 3.11 or higher |
| RabbitMQ | latest |
| Redis | latest |
git clone https://github.com/LxYxvv/quora_distributed_crawler.git
cd quora_distributed_crawler
pip install -r requirements.txt
cd quora_distributed_crawler/server
python main.py
Set the broker_url in the config.py file to your message middleware address.
Set the worker_concurrency worker process to 2, to prevent too many and frequent crawler requests.
Set the url in the utils/upload.py
How to submit a task to the queue?
Please refer to celery doc. need to configure the broker_url of the server config.py
python main.py
Download ZeroTermux https://github.com/hanxinhao000/ZeroTermux/releases
pkg update && pkg upgrade
pkg install python3
Then Installation > Configuration worker > Start worker