All notable changes to this project will be documented here.
- Dependency Management: Replaced Poetry with uv for faster dependency resolution and improved package management.
- Performance Boost: Significantly reduced install and dependency resolution times.
- Prefect Upgrade: Migrated from Prefect 2.20 to Prefect 3.2, leveraging new API improvements and performance enhancements.
- Infrastructure: Moved execution from Heroku to a TerraMaster NAS for better control and cost efficiency.
- Full Rewrite: The entire pipeline was rewritten from scratch, separating data extraction and loading (Python) from transformations (DBT) for improved modularity and maintainability.
- Prefect Upgrade: Updated from Prefect 2.7.1 to 2.20, benefiting from improved scheduling and performance.
- Improved Deployment: Adapted deployment strategy to run on a self-hosted environment.
- Continuous Integration: Implemented CI automation to handle dependency version updates automatically.
- Prefect Upgrade: Migrated from Prefect 1.1 to Prefect 2.7.1, taking advantage of a revamped API and improved task orchestration.
- Infrastructure Change: Moved execution from local setup to Heroku for easier scalability and automation.
- Python Upgrade: Moved from Python 3.9 to Python 3.11, improving performance and compatibility.
- Poetry Upgrade: Upgraded Poetry from 1.1 to 1.2 for better dependency management.
- Prefect Upgrade: Transitioned from Prefect 0.13 to Prefect 1.1.
- Switched from Luigi to Prefect: Adopted Prefect 0.13 for a more flexible and Pythonic workflow orchestration.
- Improved Observability: Leveraged Prefect UI for monitoring and debugging.
- Task Automation: Implemented custom Luigi tasks to avoid manual file handling.
- Static HTML Reports: Introduced Jinja for templating and Highcharts.js for interactive visualizations, enabling automatic report generation.
- Pandas-based Transformations: All data transformations were performed using Pandas and Python, simplifying analysis and data manipulation.
- YAML-based Workflow Tracking: Tasks now export results automatically to YAML, simplifying result storage and logging.
- Switched from Airflow to Luigi 2.8: Simplified local execution and reduced infrastructure overhead.
- Local Execution: Fully transitioned from AWS-managed workflows to local execution for better iteration speed.
- Orchestrated with Airflow: Scheduled workflows using Apache Airflow.
- AWS-Hosted: The pipeline was running on AWS, leveraging managed infrastructure.