Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[bumpversion]
current_version = 7.2.2
commit = True
tag = True

[bumpversion:file:pyproject.toml]
[bumpversion:file:prefect.yaml]
[bumpversion:file:uv.lock]
27 changes: 18 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@

This repository contains my personal data pipeline, built with [Prefect](https://www.prefect.io/), to automate data extraction, transformation, and loading (ETL). The pipeline runs on my NAS and integrates multiple data sources, processing them and exporting the results to [MotherDuck](https://motherduck.com/) for analysis in [Metabase](https://www.metabase.com/).

## Pipeline Overview
## 📌 Pipeline Overview

The pipeline serves two main purposes:

1. **Automation:** Automates repetitive data tasks by extracting data from APIs, cloud services, and other sources.
2. **Analysis:** Loads transformed data into MotherDuck, where Metabase connects for visualization and insights.

## Prefect Workflow
## ⚙️ Prefect Workflow

The pipeline is orchestrated with Prefect, using flows and tasks to manage dependencies and execution. The scheduling logic is defined in `schedules/hourly.py`, ensuring that jobs run on an hourly basis. The execution happens on my NAS.

Expand All @@ -20,7 +20,7 @@ The pipeline is orchestrated with Prefect, using flows and tasks to manage depen
3. **Transform:** Data processing is primarily handled using `dbt`, with additional transformations as needed.
4. **Visualization:** Metabase connects to MotherDuck to generate insights and reports.

## Repository Structure
## 📂 Repository Structure

```plaintext
─ vtasks
Expand All @@ -37,7 +37,7 @@ The pipeline is orchestrated with Prefect, using flows and tasks to manage depen
└── vdbt # dbt project for data modeling
```

## 🚀 How to Run Locally
## 🚀 How to Run the Pipeline Locally

> **Note**
> This project uses **UV** for dependency management and execution.
Expand All @@ -50,7 +50,7 @@ The pipeline is orchestrated with Prefect, using flows and tasks to manage depen
```
2. **Set up the virtual environment:**
```bash
uv venv --python=3.11 .venv
uv venv .venv
```
3. **Install dependencies (in editable mode for local development):**
```bash
Expand All @@ -76,7 +76,7 @@ uv run python -m vtasks.jobs.dropbox.export_tables
```
This ensures that **relative imports** work correctly.

## Deployment
## 📦 Deployment

> **Note**
> This pipeline is deployed manually to my NAS since I prefer not to grant GitHub Actions access to it. This ensures better control and security over the deployment process.
Expand All @@ -92,10 +92,19 @@ set PREFECT_API_URL=http://tnas:6006/api
prefect --no-prompt deploy --all
```

This approach provides full control over the deployment process while maintaining security and isolation.
### 🔢 Versioning

## Author
To ensure all files reflect the correct version, use `bump2version`:

```bash
bump2version patch # Or major/minor
```

This automates version updates across `prefect.yaml`, `pyproject.toml`, and `uv.lock`.

## 👤 Author
[Arnau Villoro](https://villoro.com)

## License
## 📜 License
This repository is licensed under [MIT](https://opensource.org/licenses/MIT).