A public, portfolio-friendly Python scraping project for collecting and structuring publicly accessible Egyptian Customs tariff and legislation pages from customs.gov.eg.
This repository is intentionally code-first: it includes scraper source, helper utilities, lightweight checks, and a tiny sanitized sample output. Full scraped datasets, generated JSON exports, logs, archives, browser caches, and local settings are excluded.
- Scrape Egyptian Customs tariff / HS code pages by chapter.
- Scrape legislation and circular listings from public Egyptian Customs pages.
- Extract and enrich tariff details with Playwright-powered browser automation.
- Provide helper scripts for AJAX inspection, pagination checks, ID extraction, and PDF/HTML matching workflows.
- Include lightweight test and debugging scripts for validating scraper structure and page behavior.
src/ Scraper and helper source files
tests/ Lightweight checks and test scripts
sample_data/ Tiny sanitized example output
web/ Reserved for optional public-safe demos
Create and activate a virtual environment:
python -m venv .venv
.\.venv\Scripts\Activate.ps1Install Python dependencies:
pip install -r requirements.txtInstall Playwright browser binaries:
playwright installRun the scrapers from the repository root:
python .\src\scrape_all_chapters.py
python .\src\scrape_customs.py
python .\src\scrape_legislations.pySome scripts may create local output such as JSON files, per-chapter data folders, or logs. These outputs are ignored by Git and are not part of the public repository.
Generated scraper outputs are local-only by default. The repository includes only sample_data/sample_output.json, a tiny sanitized sample that demonstrates the expected shape of output records without publishing the full scraped dataset.
The scraper references publicly accessible pages from the Egyptian Customs website:
This is an unofficial educational and research project intended as a data-engineering and web-scraping portfolio showcase. It is not an official data source, government service, or commercial customs platform.
The scraper utilities interact with publicly accessible pages of the Egyptian Customs website: https://customs.gov.eg
This repository intentionally excludes:
full scraped datasets generated exports logs archives cached data
Only source code, helper utilities, lightweight tests, and minimal example outputs are included.
Users are solely responsible for ensuring compliance with all applicable laws, website terms of use, robots.txt policies, rate limits, and data usage requirements. Please use respectful request rates and avoid disrupting public services or infrastructure.
This project is not affiliated with, endorsed by, sponsored by, or officially connected to the Egyptian Customs Authority or any government entity.
No government logos, official branding, or complete scraped databases are included in this repository.
Released under the MIT License. See the LICENSE file for details.