vtasks: Personal Pipeline

This repository contains my personal data pipeline, built with Prefect, to automate data extraction, transformation, and loading (ETL). The pipeline runs on my NAS and integrates multiple data sources, processing them and exporting the results to MotherDuck for analysis in Metabase.

📌 Pipeline Overview

The pipeline serves two main purposes:

Automation: Automates repetitive data tasks by extracting data from APIs, cloud services, and other sources.
Analysis: Loads transformed data into MotherDuck, where Metabase connects for visualization and insights.

⚙️ Prefect Workflow

The pipeline is orchestrated with Prefect, using flows and tasks to manage dependencies and execution. The scheduling logic is defined in schedules/hourly.py, ensuring that jobs run on an hourly basis. The execution happens on my NAS.

Data Flow

Extract: Data is sourced from various integrations, including Dropbox, Google Sheets, APIs, and local files.
Load: Processed data is exported to MotherDuck for storage and analysis.
Transform: Data processing is primarily handled using dbt, with additional transformations as needed.
Visualization: Metabase connects to MotherDuck to generate insights and reports.

📂 Repository Structure

─ vtasks
 ├── common          # Shared utilities and helpers
 ├── jobs            # Individual data extraction and processing jobs
 │   ├── backups     # Backup management
 │   ├── crypto      # Crypto price tracking
 │   ├── dropbox     # Dropbox integrations
 │   ├── gcal        # Google Calendar data processing
 │   ├── gsheets     # Google Sheets integrations
 │   └── indexa      # Indexa Capital data extraction
 ├── schedules       # Prefect scheduling logic
 │   └─── hourly.py  # Main schedule triggering all jobs hourly
 └── vdbt            # dbt project for data modeling

🚀 How to Run the Pipeline Locally

Note
This project uses UV for dependency management and execution.

Initial Setup

Install UV (if not installed):
```
pip install uv
```
Set up the virtual environment:
```
uv venv .venv
```
Install dependencies (in editable mode for local development):
```
uv pip install --editable .
```
Ensure pre-commit hooks are installed:
```
pre-commit install
```

Running Prefect Schedules

To run the main hourly schedule manually:

uv run python -m vtasks.schedules.hourly

Running Individual Jobs (Subflows)

To run a specific job, use module-based execution:

uv run python -m vtasks.jobs.dropbox.export_tables

This ensures that relative imports work correctly.

📦 Deployment

Note This pipeline is deployed manually to my NAS since I prefer not to grant GitHub Actions access to it. This ensures better control and security over the deployment process.

Steps for Manual Deployment:

Ensure you are connected to Tailscale.
Set the Prefect API URL to point to the NAS.
Deploy all flows manually using Prefect.

set PREFECT_API_URL=http://tnas:6006/api
prefect --no-prompt deploy --all

🔢 Versioning

To ensure all files reflect the correct version, use bump2version:

bump2version patch  # Or major/minor

This automates version updates across prefect.yaml, pyproject.toml, and uv.lock.

👤 Author

Arnau Villoro

📜 License

This repository is licensed under MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 518 Commits
.github		.github
scripts		scripts
vtasks		vtasks
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
README.md		README.md
prefect.yaml		prefect.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
secrets.yaml		secrets.yaml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vtasks: Personal Pipeline

📌 Pipeline Overview

⚙️ Prefect Workflow

Data Flow

📂 Repository Structure

🚀 How to Run the Pipeline Locally

Initial Setup

Running Prefect Schedules

Running Individual Jobs (Subflows)

📦 Deployment

Steps for Manual Deployment:

🔢 Versioning

👤 Author

📜 License

About

Releases 10

Packages

Contributors 2

Languages

villoro/vtasks

Folders and files

Latest commit

History

Repository files navigation

vtasks: Personal Pipeline

📌 Pipeline Overview

⚙️ Prefect Workflow

Data Flow

📂 Repository Structure

🚀 How to Run the Pipeline Locally

Initial Setup

Running Prefect Schedules

Running Individual Jobs (Subflows)

📦 Deployment

Steps for Manual Deployment:

🔢 Versioning

👤 Author

📜 License

About

Resources

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 2

Languages

Packages