GDP
Developed with the software and tools below.
This project performs ETL (Extract, Transform, Load) operations on Country GDP data. It extracts GDP information from a Wikipedia page, transforms the data, and loads it into both a CSV file and an SQLite database.
- Web scraping of GDP data from Wikipedia
- Data transformation from USD millions to USD billions
- CSV file generation
- SQLite database population
- SQL query execution
- Logging of ETL process steps
- Python 3.x
- pandas
- numpy
- requests
- BeautifulSoup4
- sqlite3
- Clone this repository:
- it clone https://github.com/ahmed3ab3az/IBM_Project.git cd IBM_Project
- Install the required packages: pip install pandas numpy requests beautifulsoup4
Run the main script to execute the entire ETL process: python main.py This will:
- Extract data from the specified Wikipedia URL
- Transform the GDP data from millions to billions USD
- Load the data into a CSV file (
Countries_by_GDP.csv) - Load the data into an SQLite database (
World_Economies.db) - Execute a sample query to retrieve countries with GDP >= 100 billion USD
- Log all steps in
etl_project_log.txt
etl.py: Contains all ETL functionsmain.py: Orchestrates the ETL processCountries_by_GDP.csv: Output CSV fileWorld_Economies.db: SQLite databaseetl_project_log.txt: Log file for ETL process
extract(url, table_attribs): Scrapes GDP data from the given URLtransform(df): Converts GDP from millions to billions USDload_to_csv(df, csv_path): Saves data to a CSV fileload_to_db(df, sql_connection, table_name): Loads data into SQLiterun_query(query_statement, sql_connection): Executes SQL querylog_progress(message): Logs ETL process steps
You can modify the following variables in main.py to customize the ETL process:
url: Source URL for GDP datatable_attribs: Columns to extract from the sourcedb_name: Name of the SQLite databasetable_name: Name of the table in the databasecsv_path: Path for the output CSV file
This project is open-source and available under the MIT License.
Contributions, issues, and feature requests are welcome. Feel free to check issues page if you want to contribute.
Ahmed Abdelaziz