A declarative Infrastructure-as-Code (IaC) tool for managing ClickHouse schemas using Go and Protobuf. Inspired by Terraform and Kubernetes, chschema
abandons traditional sequential migrations in favor of state reconciliation.
- Declarative Configuration: Define your schema in YAML files as the single source of truth
- Bidirectional Operations: Both apply schemas to databases AND extract schemas from databases
- State Reconciliation: Compares desired vs current state and generates execution plans
- Dry-Run Mode: Preview changes before applying them
- GitOps Ready: Version-controlled schemas with CI/CD integration
- Cluster-Native: Built for ClickHouse's distributed architecture with
ON CLUSTER
support
go install github.com/posthog/chschema@latest
Start by dumping your current ClickHouse schema:
# Dump all tables to YAML files
./chschema dump --output-dir ./schema
# Or dump only tables (skip clusters/views)
./chschema dump --tables-only --output-dir ./schema
This creates a directory structure:
schema/
├── tables/ # Table definitions
│ ├── users.yaml
│ ├── events.yaml
│ └── ...
├── materialized_views/ # Materialized view definitions
│ ├── users_mv.yaml
│ └── ...
├── views/ # Regular view definitions
│ ├── active_users.yaml
│ └── ...
├── dictionaries/ # Dictionary definitions
│ ├── user_dict.yaml
│ └── ...
└── clusters/ # Cluster configurations (future)
└── production.yaml
The dump shows statistics about what was exported:
--- Introspection Statistics ---
Dumped engines:
✓ ReplicatedReplacingMergeTree: 38
✓ Distributed: 51
✓ MaterializedView: 33
✓ Dictionary: 9
✓ Kafka: 27
Skipped engines:
(none)
--------------------------------
Edit the generated YAML files to match your desired schema:
Example: schema/tables/users.yaml
name: users
database: myapp
order_by: [user_id, created_at]
partition_by: toYYYYMM(created_at)
engine:
merge_tree: {}
columns:
- name: user_id
type: UInt64
comment: "Unique user identifier"
- name: email
type: String
- name: metadata
type: String
codec: "CODEC(ZSTD(3))"
- name: created_at
type: DateTime
defaultexpression: now()
Preview what changes will be applied:
# Show execution plan
./chschema --dry-run
# Save plan to file
./chschema --dry-run --output plan.txt
Apply the schema changes:
# Apply changes (requires confirmation)
./chschema --auto-approve
# Show execution plan (default behavior)
chschema [--dry-run]
# Apply changes automatically
chschema --auto-approve
# Export database schema to YAML
chschema dump [options]
# Show version
chschema version
--config
,-c
: Directory containing schema files (default:schema
)--connect
: ClickHouse connection string (default:localhost:9000
)--output
,-o
: Write execution plan to file
Extract database schema to YAML files:
# Basic usage
chschema dump
# Advanced options
chschema dump \
--output-dir ./my-schema \
--database myapp \
--tables-only \
--overwrite
Dump Flags:
--output-dir
,-o
: Target directory (default:./schema-dump
)--database
,-d
: Specific database to dump--tables-only
: Only dump tables, skip clusters/views--overwrite
: Overwrite existing files
Specify ClickHouse connection:
# Default local connection
chschema --connect localhost:9000
# Remote connection
chschema --connect clickhouse.example.com:9000
Organize your schema files:
schema/
├── tables/ # Table definitions
│ ├── users.yaml
│ └── events.yaml
├── materialized_views/ # Materialized view definitions
│ ├── users_mv.yaml
│ └── events_mv.yaml
├── views/ # Regular view definitions
│ └── active_users.yaml
├── dictionaries/ # Dictionary definitions
│ └── user_dict.yaml
└── clusters/ # Cluster configurations (future)
└── production.yaml
- ✅ MergeTree
- ✅ ReplicatedMergeTree
- ✅ ReplacingMergeTree
- ✅ ReplicatedReplacingMergeTree
- ✅ SummingMergeTree
- ✅ CollapsingMergeTree
- ✅ ReplicatedCollapsingMergeTree
- ✅ AggregatingMergeTree
- ✅ ReplicatedAggregatingMergeTree
- ✅ Distributed
- ✅ Log
- ✅ Kafka
- ✅ MaterializedView
- ✅ View
- ✅ Dictionary
name: table_name
database: database_name # optional
order_by: [col1, col2] # ORDER BY clause
partition_by: toYYYYMM(date) # PARTITION BY clause
engine:
merge_tree: {} # or replicated_merge_tree, replacing_merge_tree, etc.
columns:
- name: id
type: UInt64
comment: "Unique identifier" # optional, column comment
- name: name
type: String
defaultexpression: '' # optional, DEFAULT expression
- name: data
type: String
codec: "CODEC(ZSTD(3))" # optional, compression codec
- name: created_at
type: DateTime
defaultexpression: now()
name: users_mv
database: myapp
destinationTable: users_aggregated # optional, uses .inner if not specified
selectQuery: SELECT user_id, count() as cnt FROM users GROUP BY user_id
name: active_users
database: myapp
selectQuery: SELECT user_id, email FROM users WHERE active = 1
name: user_dict
database: myapp
primaryKey:
- user_id
source:
sourceType: clickhouse
sourceConfig: "SELECT user_id, email FROM users"
layout:
layoutType: flat
lifetime:
minSeconds: 300
maxSeconds: 360
attributes:
- name: user_id
type: UInt64
isKey: true
- name: email
type: String
isKey: false
name: kafka_events
database: myapp
columns:
- name: event_id
type: UUID
- name: event_data
type: String
- name: timestamp
type: DateTime
engine:
kafka:
broker_list:
- "localhost:9092"
- "broker2:9092"
topic: "events"
consumer_group: "consumer_group1"
format: "JSONEachRow"
name: production
nodes:
- host: clickhouse-1.example.com
port: 9000
shard: 1
replica: 1
database: myapp # optional
- host: clickhouse-2.example.com
port: 9000
shard: 1
replica: 2
-
Extract Current Schema:
chschema dump --output-dir ./schema
-
Make Changes: Edit YAML files
-
Preview Changes:
chschema --dry-run
-
Apply Changes:
chschema --auto-approve
# .github/workflows/schema.yml
name: Schema Management
on:
pull_request:
paths: ['schema/**']
jobs:
plan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Plan Schema Changes
run: |
./chschema --dry-run --output plan.txt
# Post plan as PR comment
apply:
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v2
- name: Apply Schema Changes
run: ./chschema --auto-approve
-
Dump Current Schema:
chschema dump --overwrite
-
Review Generated Files: Check that all tables are captured correctly
-
Commit to Git:
git add schema/ git commit -m "Initial schema dump"
-
Test Round-Trip: Ensure dumped schema can be applied cleanly
- One file per table: Keep table definitions in separate files
- Consistent naming: Use clear, descriptive names
- Version control: Always commit schema changes
- Environment separation: Use different directories for different environments
- Always dry-run first: Review changes before applying
- Backup before changes: Take database backups before major changes
- Test in staging: Validate changes in non-production environments
- Monitor deployments: Watch for errors during schema application
- Code reviews: Review schema changes like code
- Documentation: Comment complex schema decisions
- Breaking changes: Plan and communicate schema breaking changes
- Rollback plan: Have a rollback strategy for schema changes
Connection Failed:
# Check ClickHouse is running
clickhouse client --query "SELECT 1"
# Verify connection string
chschema --connect your-host:9000 version
Permission Denied:
- Ensure user has DDL permissions
- Check cluster permissions for distributed operations
Schema Conflicts:
- Run
chschema --dry-run
to see conflicts - Manually resolve differences in YAML files
# Show help
chschema --help
# Show command-specific help
chschema dump --help
See PROJECT.md for development information and project structure.
TBD