Skip to content
/ vtasks Public

Personal tasks orchestrated with Prefect

Notifications You must be signed in to change notification settings

villoro/vtasks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

b656703 Β· Mar 13, 2025
Feb 27, 2025
Dec 24, 2024
Mar 13, 2025
Mar 13, 2025
Mar 5, 2025
Feb 25, 2025
Feb 25, 2025
Feb 25, 2025
Feb 27, 2025
Mar 13, 2025
Mar 13, 2025
Mar 13, 2025
Dec 23, 2024
Mar 13, 2025

Repository files navigation

vtasks: Personal Pipeline

pre-commit

This repository contains my personal data pipeline, built with Prefect, to automate data extraction, transformation, and loading (ETL). The pipeline runs on my NAS and integrates multiple data sources, processing them and exporting the results to MotherDuck for analysis in Metabase.

πŸ“Œ Pipeline Overview

The pipeline serves two main purposes:

  1. Automation: Automates repetitive data tasks by extracting data from APIs, cloud services, and other sources.
  2. Analysis: Loads transformed data into MotherDuck, where Metabase connects for visualization and insights.

βš™οΈ Prefect Workflow

The pipeline is orchestrated with Prefect, using flows and tasks to manage dependencies and execution. The scheduling logic is defined in schedules/hourly.py, ensuring that jobs run on an hourly basis. The execution happens on my NAS.

Data Flow

  1. Extract: Data is sourced from various integrations, including Dropbox, Google Sheets, APIs, and local files.
  2. Load: Processed data is exported to MotherDuck for storage and analysis.
  3. Transform: Data processing is primarily handled using dbt, with additional transformations as needed.
  4. Visualization: Metabase connects to MotherDuck to generate insights and reports.

πŸ“‚ Repository Structure

─ vtasks
 β”œβ”€β”€ common          # Shared utilities and helpers
 β”œβ”€β”€ jobs            # Individual data extraction and processing jobs
 β”‚   β”œβ”€β”€ backups     # Backup management
 β”‚   β”œβ”€β”€ crypto      # Crypto price tracking
 β”‚   β”œβ”€β”€ dropbox     # Dropbox integrations
 β”‚   β”œβ”€β”€ gcal        # Google Calendar data processing
 β”‚   β”œβ”€β”€ gsheets     # Google Sheets integrations
 β”‚   └── indexa      # Indexa Capital data extraction
 β”œβ”€β”€ schedules       # Prefect scheduling logic
 β”‚   └─── hourly.py  # Main schedule triggering all jobs hourly
 └── vdbt            # dbt project for data modeling

πŸš€ How to Run the Pipeline Locally

Note
This project uses UV for dependency management and execution.

Initial Setup

  1. Install UV (if not installed):
    pip install uv
  2. Set up the virtual environment:
    uv venv .venv
  3. Install dependencies (in editable mode for local development):
    uv pip install --editable .
  4. Ensure pre-commit hooks are installed:
    pre-commit install

Running Prefect Schedules

To run the main hourly schedule manually:

uv run python -m vtasks.schedules.hourly

Running Individual Jobs (Subflows)

To run a specific job, use module-based execution:

uv run python -m vtasks.jobs.dropbox.export_tables

This ensures that relative imports work correctly.

πŸ“¦ Deployment

Note This pipeline is deployed manually to my NAS since I prefer not to grant GitHub Actions access to it. This ensures better control and security over the deployment process.

Steps for Manual Deployment:

  1. Ensure you are connected to Tailscale.
  2. Set the Prefect API URL to point to the NAS.
  3. Deploy all flows manually using Prefect.
set PREFECT_API_URL=http://tnas:6006/api
prefect --no-prompt deploy --all

πŸ”’ Versioning

To ensure all files reflect the correct version, use bump2version:

bump2version patch  # Or major/minor

This automates version updates across prefect.yaml, pyproject.toml, and uv.lock.

πŸ‘€ Author

Arnau Villoro

πŸ“œ License

This repository is licensed under MIT.