BPFetcher

A web scraping CLI tool for price comparison across Portuguese bookstores.

Scrapes book prices and availability from Wook, Bertrand, Fnac, and Almedina using ISBN or text search.

Getting Started • Features • Usage • Scrapers

Getting Started

Clone the repository

git clone https://github.com/akaTatago/BPFetcher.git
cd BPFetcher

Install dependencies

pip install -r requirements.txt

Install Playwright browsers (required for Fnac scraping)

playwright install chromium

Run the scraper

python -m src.main input.csv --output results.csv --stores all

Features

Multiple Search Modes: Search by ISBN-13 or by Title + Author
Multiple Stores Support: Scrapes Wook, Bertrand, Fnac, and Almedina
Concurrent Scraping: Fast parallel processing for non-browser scrapers
Smart Matching: Validates book matches using normalized titles and authors
Price Tracking: Detects sale prices and availability status
CSV Export: Clean, structured output with all store data

Usage

Basic Usage

python -m src.main input.csv

Search by ISBN (default)

python -m src.main books.csv --mode isbn --output results.csv

Your CSV should contain an ISBN13 column:

ISBN13	Title	Author
9780316769174	The Catcher in...	J.D. Salinger

Search by Text

python -m src.main books.csv --mode text --output results.csv

Your CSV should contain Title and Author columns:

Title	Author
The Catcher in the Rye	J.D. Salinger

Select Specific Stores

python -m src.main input.csv --stores wook fnac

Available stores: wook, bertrand, fnac, almedina, or all (default)

Scrapers

Request-Based Scrapers

Wook, Bertrand, and Almedina use curl-cffi for fast HTTP requests with browser impersonation.

Randomized delays to avoid rate limiting
Concurrent execution via ThreadPoolExecutor
Efficient parsing with BeautifulSoup

Browser-Based Scraper

Fnac uses Playwright due to anti-bot protection:

Headless Chromium browser
Automatic CAPTCHA detection (manual solve required)
Cookie consent handling
Stealth mode with webdriver detection disabled

Output Format

ISBN Mode

Creates one row per book with columns for each store:

Title, Author, Wook Status, Wook Price, Wook On Sale, Wook Link, Bertrand Status, ...

Text Mode

Creates multiple rows per book (one per matching result):

Title, Author, Store, Title Found, Author Found, Status, Price, On Sale, Link

Architecture

BPFetcher/
├── main.py                 # CLI entry point
├── requirements.txt
├── src/
│   ├── scrapers/
│   │   ├── base_scraper.py    # Abstract base class
│   │   ├── wook.py            # Wook scraper
│   │   ├── bertrand.py        # Bertrand scraper
│   │   ├── fnac.py            # Fnac scraper (Playwright)
│   │   └── almedina.py        # Almedina scraper
│   └── utils/
│       ├── scraping_helper.py # Shared scraping utilities
│       └── csv_helper.py      # CSV loading/saving
└── data/
    └── results.csv         # Default output location

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BPFetcher

Getting Started

Features

Usage

Basic Usage

Search by ISBN (default)

Search by Text

Select Specific Stores

Scrapers

Request-Based Scrapers

Browser-Based Scraper

Output Format

ISBN Mode

Text Mode

Architecture

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BPFetcher

Getting Started

Features

Usage

Basic Usage

Search by ISBN (default)

Search by Text

Select Specific Stores

Scrapers

Request-Based Scrapers

Browser-Based Scraper

Output Format

ISBN Mode

Text Mode

Architecture

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages