Benthos API is the open infrastructure for data sovereignty in coastal community work. Communities own the data they generate. This codebase carries the schema, the consent grammar and the audit layer that turn that ownership into something a system can enforce.
The service anchors Stage 1 of the Benthos roadmap. It's a FastAPI app that receives events from a WhatsApp bot (Twilio, out of scope for this repo), validates them through Pydantic and Mongo $jsonSchema, then persists them in Mongo with role separation between application data and the consent vault.
The Brazilian territorial deployment of this API runs as Atcheza, a Portuguese-language WhatsApp bot in the marine extractive reserve of Arraial do Cabo, in Rio de Janeiro state.
Stage 1 is in production, covering the baseline data sovereignty schema: catch events, consent records and the external release log. Stage 2 is in development, with traditional knowledge workflows and community voice reporting on the way.
POST /v1/catch-events: record a fishing trip. The server checks consent scope and community match, blocks unauthorized categories, then computes CPUE before writing the record.POST /v1/consent-records: create a consent record (vault-rw).POST /v1/consent-records/{id}/revoke: revoke a consent record (vault-rw, no body).POST /v1/external-releases: record an authorized external release. The route checks the privacy threshold, confirms the required signatures from the configured roster and reads the live consent state before persisting.GET /v1/external-releases: public listing for community audit, no auth.POST /v1/traditional-knowledge-events: record a custodian's pseudonymized traditional knowledge entry (Stage 1.5). The consent gate is the same as for catch events.POST /v1/community-voice-events: record a civic incident report (oil spill, illegal fishing, etc.) with optional escalation opt-in (Stage 1.5). The app-role GET strips the reporter identity; the vault-role GET includes it.
Also included: GET / for metadata, GET /health for readiness, GET /docs for OpenAPI and GET /openapi.json.
Prerequisites: Docker + Docker Compose, uv (Python package manager).
cp .env.example .env
# Edit .env and replace every "replace_me_*" placeholder with a real value.
uv lock # generate uv.lock (committed for reproducibility)
docker-compose up --build # starts mongo + apiOpen http://localhost:8000/docs for the OpenAPI explorer.
Validate readiness: curl http://localhost:8000/health should return {"mongo_app": "ok", "mongo_vault": "ok"} with HTTP 200.
Tests run against a real Mongo instance. The validators, indexes and role separation are the same as in production.
docker-compose up mongo -d # start Mongo only, in background
uv sync --group dev # install pytest + httpx
uv run pytestThe suite under tests/ covers the sovereignty edge cases (revoked consent blocking a release, insufficient quorum blocking, k<5 blocking, unauthorized category blocking a catch event) along with idempotency, audit emission, the dual-view role separation for community voice and the Stage 2 identity dual-write.
Two Mongo users with distinct roles, both connecting to the same DB:
benthos_app(used by catch and release endpoints):find+insertoncatch_eventsandexternal_release_logs(these collections are append-only), andfindonly onconsent_records. Thefisher_identity_mapis kept outside this role.benthos_vault(used by consent endpoints and identity helpers):find,insertandupdateonconsent_recordsandfisher_identity_map. The catch and release tables stay outside this role.
The API opens two distinct Mongo clients via MONGO_APP_URI / MONGO_VAULT_URI; routes declare which one they need through FastAPI dependencies (benthos_api/deps.py). Even if the application is fully compromised, the pseudo↔real identity mapping stays inaccessible to app code.
catch_events and external_release_logs accept insert only. Mongo roles enforce that at the database layer, the API exposes only write-once routes, and the release log stays the public auditable trail of every external data flow with its immutability guaranteed by the architecture itself.
Every model is validated twice on the way in:
- Once by Pydantic at the API boundary, with
extra="forbid"and cross-field validators (e.g.privacy_methodmust matchprivacy_params). - Once by Mongo's
$jsonSchemaon insert.
Enum string values stay identical letter-by-letter between benthos_api/models/common.py and mongo-init.js. Pydantic handles the cross-field rules that $jsonSchema can't express, while structural shape and required fields are enforced at both layers.
Every document carries schema_version: "1.0.0". Migrations aren't implemented in Stage 1. See Limitations for details.
ConsentRecord requires the 13 fields most relevant to Stage 1 operations (community, scope, categories, custodian, prior consent date, retention, withdrawal mechanism, etc.). The remaining CARE-MANDATORY fields from the source planilha (ethical_framework_source, relationship_custodian_name, conflict_resolution_mechanism, benefit_sharing_mechanism, local_language_primary) are accepted as optional and admit null.
cpue_kg_per_hour = catch_weight_kg / fishing_effort_hours is computed by the server on insert. Clients cannot supply it (extra="forbid" rejects the field).
Catch events are idempotent on (community, fisher_pseudo_id, event_date, catch_species) via the unique index uq_catch_event_natural_key. Duplicate POSTs return HTTP 409.
RELEASE_QUORUM_REQUIRED_ROLES (env var, comma-separated) lists the roles whose signatures are required to authorize an external release. The Stage 1 default is committee_chair,data_steward. Bumping to a third role at committee maturity takes a config change, no code change required. The release endpoint validates that every required role is present in quorum_signatures (Pydantic enforces distinctness).
mongo-init.js runs once, only on the first container init (when /data/db is empty). It defines the target schema state for a fresh deployment. Editing mongo-init.js after the volume already exists has no effect on the running database.
For schema evolution after first init, the migrations/ directory holds versioned, idempotent migration scripts. Each script captures the delta from one state to the next and stays immutable after approval. For subsequent schema changes, write a new migration script to preserve the audit chain.
Apply migration 001 (the Stage 1.5 migration, which adds traditional_knowledge_events and community_voice_events collections and extends the consent_records.approved_categories and audit_log enums):
docker compose exec -T mongo mongosh \
-u "$MONGO_INITDB_ROOT_USERNAME" -p "$MONGO_INITDB_ROOT_PASSWORD" \
--authenticationDatabase admin \
< migrations/001_stage_1_5_tk_and_community_voice.jsRe-running the same migration is a no-op. The script guards createCollection with getCollectionNames and lets collMod reapply the existing validator, while index creation and privilege grants handle their own deduplication natively.
Rollback (manual, safe only before any Stage 1.5 inserts):
- Drop the 2 new collections using the root admin client.
- Prior
consent_recordsandaudit_logvalidators can be restored from a pre-migrationmongodump, or from a clean re-init using a Stage 1mongo-init.js. - Finish by revoking the 3 new privileges via
revokePrivilegesFromRole.
The migration file header documents the exact rollback commands. For production, snapshot the prior validator before applying so the rollback is mechanical.
These are the Stage 1 limits for now. Each one has a follow-up below.
- Schema migrations aren't implemented yet. Subsequent schema changes need an explicit migration plan, which is on the agenda for Stage 2.
assert_k_anonymityvalidates the declaredkvalue. Record-level equivalence class checks land with the Stage 4 aggregation pipeline, since the API is the audit-trail layer and actual aggregation happens upstream in the data steward's pipeline.field_set→DataCategorygranular mapping isn't enforced yet. Stage 1 only checksconsent_scope != internal_onlyto authorize external release. Stage 4 adds field-to-category enforcement.- Auth, CORS, rate-limiting and structured observability are out of scope for Stage 1. The first ship was the data integrity layer. Security and operability hardening land in Stage 2.
fisher_identity_mapexists but no REST endpoint populates it yet. The collection has vault-only access. Helpers inbenthos_api/privacy.py(make_pseudo_id) mint the pseudonyms. The population flow lives outside this service in Stage 1.
In priority order, for the next phase:
- Add
ruffandmypyto thedevdependency group inpyproject.toml. - Refactor
config.pyto build Mongo URIs from parts (MONGO_HOST,MONGO_PORT,MONGO_USER,MONGO_PASSWORD) instead of taking whole URIs as env vars. This eliminates password drift between theMONGO_*_PASSWORDfield in.env.exampleand the password embedded in the URI. - Implement an explicit schema migration plan for transitions beyond
schema_version: "1.0.0"(Stage 2+). - Pass
PRIVACY_K_ANONYMITY_MINandPRIVACY_DP_EPSILON_MAXintomongo-init.jsso the$jsonSchemavalidators enforce the same thresholds at the database layer. - Validate temporal precision of
event_dateinroutes/catch.pyso morning and afternoon catches of the same species don't collide onuq_catch_event_natural_key. - Address the
assert_k_anonymityandfield_set→DataCategorymapping limitations documented above (Stage 4).
Stage 1 is considered complete when each of the following is verified by hand:
-
docker-compose upstarts mongo + api with no initialization errors. -
curl http://localhost:8000/docsshows the OpenAPI docs for the endpoints listed here. -
uv run pytestpasses green, covering idempotency and the sovereignty scenarios. - Someone new to the project can follow this README and get the service running locally in under 10 minutes.
- No secrets, real fisher/community/consent data or references to non-mocked external resources appear in the committed code.
The codebase is maintained by a small team at Benthos gGmbH and we answer issues as capacity allows. PRs are welcome but acceptance depends on alignment with our roadmap. Deployment work stays focused on the territories we actively support; for territorial deployments, please contact info@benthos.ai.
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
Copyright (C) 2026 Benthos gGmbH.
AGPL-3.0 is a strong copyleft license for network software. Anyone running a modified version of Benthos as a service has to make that source code available under the same license to people interacting with it. This keeps the consent grammar open, preserves an inspectable audit trail and makes it harder for the privacy gates to disappear into proprietary forks. It's also how we honor § 12 of the Benthos Satzung: GDPR, responsible AI, preservation of local knowledge, fair cooperation.
The full license text is in LICENSE.
Territorial deployments built on top of this infrastructure (community-specific configuration, local glossary, data) stay private. Community data is never open-sourced.