Skip to content

vjvelascorios/econscrap

Repository files navigation

Banxico Reports Scraper

A collection of scripts to automatically download reports from Banco de México (Banxico) website. The scraper handles multiple report types and supports parallel downloads.

Features

Requirements

requests
beautifulsoup4
pandas
pathlib
tqdm

Installation

pip install requests beautifulsoup4 pandas pathlib tqdm

Usage

Each script can be run independently:

# Download quarterly reports
python "scripts/informes trimestrales.py"

# Download regional reports 
python "scripts/informes regionales.py"

# Download library updates
python "scripts/library_updates.py"

Configuration

  • Default save location: reports and files/
  • Threading enabled by default with 10 workers
  • Configurable retry logic for failed downloads

Output Structure

reports and files/
├── banxico_quarterly_reports/
├── banxico_regional_reports/
└── banxico_library_updates/

Error Handling

  • Automatic retries for failed downloads
  • Skips existing files to avoid duplicates
  • Detailed error logging and progress tracking

Posible errors:

Just in case you get an error, check the filepath where the files are being saved. You can change the path in the scripts.

GitHub Actions Automation

This repository includes automated workflows that run on GitHub Actions to download reports automatically:

Schedule

  • Library Updates: 3rd of each month at 00:00 Mexico City time (06:00 UTC)
  • Quarterly Reports: 28th-29th of every 3rd month at 00:00 Mexico City time (06:00 UTC)
  • Regional Reports: 11th-15th of every 3rd month at 00:00 Mexico City time (06:00 UTC)

Manual Execution

You can also trigger the workflow manually from the GitHub Actions tab.

Troubleshooting

If the GitHub Actions are not working:

  1. Run Local Tests:

    python test_workflow.py
  2. Check Debug Information:

    python debug_workflow.py
  3. Common Issues:

    • Branch mismatch (ensure workflow uses 'master' not 'main')
    • Schedule condition mismatch with cron expressions
    • Missing dependencies in requirements.txt
    • Incorrect file paths for artifacts

Recent Fixes

  • ✅ Fixed branch name from 'main' to 'master'
  • ✅ Fixed schedule condition times to match cron expressions
  • ✅ Added debug and test scripts for troubleshooting
  • ✅ Reorganized workflow steps for better execution order

TODO

  • Add Monetary Policy Reports, Surveys, Press Reports, Stability Reports, etc.
  • Workflows for automatic downloads and updates
  • Annual compilation reports
  • Error notification system for failed downloads
  • Data validation and quality checks

About

scraping reports

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages