A production-ready Docker deployment setup for Dagster Core, providing a robust orchestration environment for data pipelines. This repository contains the core infrastructure components needed to run Dagster in a containerized environment.
This repository is part of a distributed Dagster setup that consists of two main components:
-
Core Infrastructure (This Repository):
- Dagster webserver and daemon processes
- Core infrastructure configuration
- Base monitoring and scheduling
- Database and storage management
-
Code Locations (Separate Repository):
- Contains actual pipeline definitions
- Separate deployment lifecycle
- Independent versioning
- Flexible scaling options
- View Code Location Template Repository
- Separation of Concerns: Core infrastructure changes less frequently than pipeline code
- Independent Scaling: Scale code locations based on workload without affecting core
- Simplified Development: Teams can work on pipelines without touching core infrastructure
- Better Resource Management: Allocate resources specifically to computation needs
- Easier Maintenance: Update core components without affecting running pipelines
- Deploy this core infrastructure first
- Deploy code locations using the companion repository
- Code locations will automatically register with this core instance
Example distributed setup:
Infrastructure Layer (This Repo)
├── Dagster Webserver (Port 3000)
├── Dagster Daemon
└── Shared Storage/Database
Pipeline Layer (Separate Repo)
├── Code Location Server 1 (Pipeline Group A)
├── Code Location Server 2 (Pipeline Group B)
└── Code Location Server N (Pipeline Group X)
The deployment consists of two main services:
- Dagster Webserver: Provides the web UI and API endpoints
- Dagster Daemon: Handles background processing, scheduling, and sensor evaluations
sira-dagster-core/
├── core/
│ ├── dagster_home/ # Dagster instance configuration
│ │ ├── dagster.yaml # Main Dagster configuration
│ │ └── workspace.yaml # Workspace configuration
│ ├── io_manager_storage/ # Persistent storage for IO managers
│ └── storage/ # General Dagster storage
├── .vscode/ # VS Code configuration
├── docker-compose.yml # Docker services definition
├── Dockerfile # Docker image definition
├── requirements.txt # Python dependencies
└── various config files # (.env, .flake8, etc.)
- Create a Docker network for Dagster:
docker network create dagster_network
- Configure your environment:
cp .env.example .env
# Edit .env with your specific configuration
- Start the services:
docker-compose up -d
- Access the Dagster UI at
http://localhost:3000
The docker-compose.yml
defines two main services:
-
webserver_dagster:
- Runs the Dagster web interface
- Exposes port 3000
- Mounts necessary volumes for persistence
- Configurable through environment variables
-
daemon_dagster:
- Runs the Dagster daemon process
- Handles background tasks and scheduling
- Shares configuration with the webserver
Key environment variables (defined in .env
):
DAGSTER_POSTGRES_*
: PostgreSQL connection detailsAWS_*
: AWS credentials for S3 storageDAGSTER_CURRENT_IMAGE
: Docker image reference
The dagster.yaml
configuration includes:
- Run launcher configuration
- Storage settings (PostgreSQL)
- Compute log management (S3)
- Scheduler settings
The Dockerfile creates a minimal Python 3.12 image with:
- Essential Dagster dependencies
- Custom configuration mounting
- Workspace setup
# Using docker-bake.hcl for multi-platform builds
docker buildx bake -f docker-bake.hcl
- Sensitive information is managed through environment variables
- Docker volumes are used for persistent storage
- Network isolation through Docker networking
- Proper permission management for mounted volumes
The repository includes comprehensive development configurations:
- VS Code settings for Python and YAML
- Linting configurations (flake8, yamllint)
- Editor configurations (.editorconfig)
- Git ignore patterns
Update workspace.yaml
to add new pipeline locations. See our Code Location Template for implementation examples:
load_from:
- grpc_server:
host: your-host
port: your-port
location_name: "your-pipeline"
Adjust the run_coordinator
settings in dagster.yaml
to control concurrent runs:
run_coordinator:
config:
max_concurrent_runs: 5
We welcome contributions to improve the Dagster Core deployment setup! Please read our Contributing Guidelines for details on:
- Code of conduct
- Development setup
- Submission process
- Coding standards
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request using our PR template
Security is important to us. If you discover any security-related issues, please follow our Security Policy for proper reporting procedures.
- Containerized deployment
- Environment-based configuration
- No hardcoded credentials
- Regular dependency updates via Renovate
- Multi-architecture support
This project is licensed under the MIT License - see the LICENSE file for details.
- Free for commercial and personal use
- No warranty provided
- Attribution required when redistributing
- Currently supports PostgreSQL for storage
- S3 is required for compute logs
- Multi-platform support is limited to amd64 and arm64
Regular maintenance tasks:
- Update Python dependencies in
requirements.txt
- Check for Dagster version updates
- Monitor Docker image size
- Review security patches
The repository includes automated workflows for building and publishing multi-architecture Docker images:
-
File:
.github/workflows/docker-build.yml
-
Triggers:
- Push to
main
branch - Any tag starting with
v
(e.g.,v1.0.0
) - Pull requests to
main
- Push to
-
Actions:
- Builds multi-architecture images (linux/amd64, linux/arm64)
- Pushes to GitHub Container Registry (ghcr.io)
- Tags images based on:
- Branch name
- PR number (for pull requests)
- Semantic version (for version tags)
- Git SHA
-
Usage:
# Images are automatically built and tagged # To pull the latest image: docker pull ghcr.io/[your-username]/sira-dagster-core:latest # To pull a specific version: docker pull ghcr.io/[your-username]/sira-dagster-core:v1.0.0
-
File:
renovate.json
-
Purpose: Automated dependency updates
-
Features:
- Automatically updates:
- Python dependencies in
requirements.txt
- Base Docker image in
Dockerfile
- GitHub Actions versions
- Python dependencies in
- Groups Dagster-related updates together
- Automatically merges minor/patch updates
- Creates PRs for major updates
- Automatically updates:
-
Configuration Highlights:
{ "packageRules": [ { "matchPackagePatterns": ["^dagster"], "groupName": "dagster packages" } ] }
- Dependencies are checked weekly
- Minor updates are auto-merged
- Major updates require manual review
- Security updates are prioritized
- Check the "Pull Requests" tab for pending updates
- Review the Renovate dashboard for upcoming updates
- Check GitHub Actions tab for build status
# Manually trigger a new Docker build
git tag v1.0.0
git push origin v1.0.0
# This will trigger the workflow to:
# 1. Build new multi-arch images
# 2. Tag them with v1.0.0
# 3. Push to container registry