REST API and workers for the Matyan experiment-tracking stack (fork of Aim). Serves reads and control operations from FoundationDB; consumes ingestion and control events from Kafka; uses S3/GCS/Azure for artifact blobs. The UI talks to this API; training clients send data via the frontier, which publishes to Kafka consumed by these workers.
src/matyan_backend/— Python package: FastAPI app (app.py), API routes underapi/(runs, experiments, tags, projects, dashboards, reports, streaming),storage/(FDB + S3/GCS/Azure),workers/(ingestion + control Kafka consumers),jobs/(FDB lock, used by CLI cleanup commands),backup/(export/restore), CLI incli.py.- Entrypoints:
matyan-backend start(API server, default port 53800),matyan-backend ingest-worker,matyan-backend control-worker; plus one-off CLI commands (reindex, backup, restore, finish-stale, cleanup-orphan-blobs, cleanup-tombstones, convert tensorboard).
- Python 3.12+. The package uses
uvin the repo:uv run matyan-backendor install thenmatyan-backendCLI. - Runtime dependencies: FoundationDB (cluster file), Kafka (for workers), blob store. For local dev, typically run FDB + Kafka + S3 (RustFS) via docker-compose.
- API server:
uv run matyan-backend start(ormatyan-backend start). Options:--host,--port(defaults:0.0.0.0, 53800). API is under/api/v1; health at/health/ready/,/health/live/, metrics at/metrics/when enabled. - Workers:
uv run matyan-backend ingest-workeranduv run matyan-backend control-worker. Both require Kafka and FDB; ingestion worker also writes to FDB and reads blob storage config for blob references. - CLI (one-off):
reindex(rebuild indexes),backup/restore,finish-stale,cleanup-orphan-blobs,cleanup-tombstones. See the backend CLI help (matyan-backend cleanup-orphan-blobs --help,matyan-backend cleanup-tombstones --help) and References — CLI for all options. Cleanup commands are intended for CronJobs or cron; use--dry-runto preview and--lock-ttl-secondsfor FDB-based single-run locking. Optional:convert tensorboardto convert TensorBoard logs to backup format.
| Variable | Default | Purpose |
|---|---|---|
MATYAN_ENVIRONMENT / ENVIRONMENT |
development |
When production, strict checks apply (see Production configuration). |
LOG_LEVEL |
INFO |
Log level (loguru + uvicorn). |
FDB_CLUSTER_FILE |
fdb.cluster |
Path to FoundationDB cluster file. |
BLOB_BACKEND_TYPE |
s3 |
Storage backend: s3, gcs, or azure. |
S3_ENDPOINT |
http://localhost:9000 |
S3-compatible API URL. |
S3_ACCESS_KEY / S3_SECRET_KEY |
(dev defaults) | S3 credentials. |
S3_BUCKET |
matyan-artifacts |
Bucket for artifacts (when using s3). |
S3_REGION |
us-east-1 |
S3 region (default: us-east-1). |
GCS_BUCKET |
matyan-artifacts |
Bucket for artifacts (when using gcs). |
AZURE_CONTAINER |
matyan-artifacts |
Container for artifacts (when using azure). |
AZURE_CONN_STR |
"" |
Azure connection string. |
AZURE_ACCOUNT_URL |
"" |
Azure account URL (for DefaultAzureCredential). |
BLOB_URI_SECRET |
(dev default) | Fernet key for blob URIs; must be set in production. |
KAFKA_BOOTSTRAP_SERVERS |
localhost:9092 |
Kafka broker list. |
KAFKA_DATA_INGESTION_TOPIC |
data-ingestion |
Topic for ingestion messages. |
KAFKA_CONTROL_EVENTS_TOPIC |
control-events |
Topic for control events. |
KAFKA_SECURITY_PROTOCOL / KAFKA_SASL_* |
(empty) | Optional Kafka SASL. |
METRICS_ENABLED |
true |
Expose Prometheus metrics. |
METRICS_PORT |
9090 |
Port for metrics HTTP server (workers). |
INGEST_MAX_MESSAGES_PER_TXN |
100 |
Max messages per FDB transaction (ingestion worker). |
INGEST_MAX_TXN_BYTES |
8388608 (8 MB) |
Target max transaction size; FDB limit is 10 MB. |
CORS_ORIGINS |
(localhost list) | Allowed origins for CORS. |
Source of truth: config.py.
See docs/PRODUCTION_CONFIG.md for enabling production mode (MATYAN_ENVIRONMENT=production), required overrides, and supplying secrets via env or a secrets backend.
- Docker: Build the backend image (context from repo root); run API and workers as separate processes or containers.
- Kubernetes/Helm: The chart in
deploy/helm/matyandeploys the backend API, ingestion worker, and control worker as separate Deployments; optional CronJobs forcleanup-orphan-blobsandcleanup-tombstones. Configure FDB, blob storage (S3, GCS, Azure), and Kafka via chart values; see the chart README. SetMATYAN_ENVIRONMENT=productionand required env for production.
- UI: matyan-ui calls this backend REST API.
- Frontier: matyan-frontier publishes to Kafka; backend workers consume.
- API models: matyan-api-models shared types (Kafka messages, run creation, etc.).
- Monorepo: This package lives under
extra/matyan-backendin the matyan-core repo.