🕷️ Web Scraper & Crawler in Python

A Python-based web scraper and crawler designed to extract structured data from various websites — including those with anti-scraping techniques like custom headers and CSRF protection.

🌐 Target Websites

🛠️ Features

✅ Custom headers to bypass basic anti-bot detection
✅ Automatic pagination support
✅ CSRF token retrieval and session-based login handling
✅ Output data to JSON

📁 Project Structure

Scraper/
├── Sync/                     # Synchronous scraping modules
│   ├── Categories.py         # Gets all the books categories
│   ├── NamePrice.py          # Extracts book names and prices
│   └── Total.py              # Extracts book names and prices as per their categories
│
├── async.py                  # Asynchronous scraping module
├── header.py                 # Manages headers/user-agents
├── Login.py                  # Handles login/authentication
└── Scrape.json               # Scraping output for "async.py"

🧰 Tech Stack

Python 3.11.9
requests – HTTP requests
BeautifulSoup – HTML parsing
asyncio, aiohttp
json

⚙️ Setup Instructions

Clone the Repository

git clone https://github.com/Argu333/Scraper.git
cd Scraper

Install the used libraries (if not installed)

pip install requests
pip install beautifulsoup4
pip install aiohttp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🕷️ Web Scraper & Crawler in Python

🌐 Target Websites

🛠️ Features

📁 Project Structure

🧰 Tech Stack

⚙️ Setup Instructions

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Sync		Sync
Login.py		Login.py
README.md		README.md
Scrape.json		Scrape.json
async.py		async.py
header.py		header.py

Argu333/Scraper

Folders and files

Latest commit

History

Repository files navigation

🕷️ Web Scraper & Crawler in Python

🌐 Target Websites

🛠️ Features

📁 Project Structure

🧰 Tech Stack

⚙️ Setup Instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages