Skip to content

Argu333/Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🕷️ Web Scraper & Crawler in Python

A Python-based web scraper and crawler designed to extract structured data from various websites — including those with anti-scraping techniques like custom headers and CSRF protection.


🌐 Target Websites


🛠️ Features

  • ✅ Custom headers to bypass basic anti-bot detection
  • ✅ Automatic pagination support
  • ✅ CSRF token retrieval and session-based login handling
  • ✅ Output data to JSON

📁 Project Structure

Scraper/
├── Sync/                     # Synchronous scraping modules
│   ├── Categories.py         # Gets all the books categories
│   ├── NamePrice.py          # Extracts book names and prices
│   └── Total.py              # Extracts book names and prices as per their categories
│
├── async.py                  # Asynchronous scraping module
├── header.py                 # Manages headers/user-agents
├── Login.py                  # Handles login/authentication
└── Scrape.json               # Scraping output for "async.py"

🧰 Tech Stack


⚙️ Setup Instructions

  1. Clone the Repository
git clone https://github.com/Argu333/Scraper.git
cd Scraper
  1. Install the used libraries (if not installed)
pip install requests
pip install beautifulsoup4
pip install aiohttp

About

A simple web scraper and crawler made with python

Resources

Stars

Watchers

Forks

Languages