A web scraping CLI tool for price comparison across Portuguese bookstores.
Scrapes book prices and availability from Wook, Bertrand, Fnac, and Almedina using ISBN or text search.
Getting Started • Features • Usage • Scrapers
Clone the repository
git clone https://github.com/akaTatago/BPFetcher.git
cd BPFetcherInstall dependencies
pip install -r requirements.txtInstall Playwright browsers (required for Fnac scraping)
playwright install chromiumRun the scraper
python -m src.main input.csv --output results.csv --stores all- Multiple Search Modes: Search by ISBN-13 or by Title + Author
- Multiple Stores Support: Scrapes Wook, Bertrand, Fnac, and Almedina
- Concurrent Scraping: Fast parallel processing for non-browser scrapers
- Smart Matching: Validates book matches using normalized titles and authors
- Price Tracking: Detects sale prices and availability status
- CSV Export: Clean, structured output with all store data
python -m src.main input.csvpython -m src.main books.csv --mode isbn --output results.csvYour CSV should contain an ISBN13 column:
| ISBN13 | Title | Author |
|---|---|---|
| 9780316769174 | The Catcher in... | J.D. Salinger |
python -m src.main books.csv --mode text --output results.csvYour CSV should contain Title and Author columns:
| Title | Author |
|---|---|
| The Catcher in the Rye | J.D. Salinger |
python -m src.main input.csv --stores wook fnacAvailable stores: wook, bertrand, fnac, almedina, or all (default)
Wook, Bertrand, and Almedina use curl-cffi for fast HTTP requests with browser impersonation.
- Randomized delays to avoid rate limiting
- Concurrent execution via ThreadPoolExecutor
- Efficient parsing with BeautifulSoup
Fnac uses Playwright due to anti-bot protection:
- Headless Chromium browser
- Automatic CAPTCHA detection (manual solve required)
- Cookie consent handling
- Stealth mode with webdriver detection disabled
Creates one row per book with columns for each store:
Title, Author, Wook Status, Wook Price, Wook On Sale, Wook Link, Bertrand Status, ...
Creates multiple rows per book (one per matching result):
Title, Author, Store, Title Found, Author Found, Status, Price, On Sale, Link
BPFetcher/
├── main.py # CLI entry point
├── requirements.txt
├── src/
│ ├── scrapers/
│ │ ├── base_scraper.py # Abstract base class
│ │ ├── wook.py # Wook scraper
│ │ ├── bertrand.py # Bertrand scraper
│ │ ├── fnac.py # Fnac scraper (Playwright)
│ │ └── almedina.py # Almedina scraper
│ └── utils/
│ ├── scraping_helper.py # Shared scraping utilities
│ └── csv_helper.py # CSV loading/saving
└── data/
└── results.csv # Default output location