Skip to content

Sypriano/benthos-api

Benthos API

License: AGPL v3

Benthos API is the open infrastructure for data sovereignty in coastal community work. Communities own the data they generate. This codebase carries the schema, the consent grammar and the audit layer that turn that ownership into something a system can enforce.

The service anchors Stage 1 of the Benthos roadmap. It's a FastAPI app that receives events from a WhatsApp bot (Twilio, out of scope for this repo), validates them through Pydantic and Mongo $jsonSchema, then persists them in Mongo with role separation between application data and the consent vault.

The Brazilian territorial deployment of this API runs as Atcheza, a Portuguese-language WhatsApp bot in the marine extractive reserve of Arraial do Cabo, in Rio de Janeiro state.

Status

Stage 1 is in production, covering the baseline data sovereignty schema: catch events, consent records and the external release log. Stage 2 is in development, with traditional knowledge workflows and community voice reporting on the way.

Endpoints

  • POST /v1/catch-events: record a fishing trip. The server checks consent scope and community match, blocks unauthorized categories, then computes CPUE before writing the record.
  • POST /v1/consent-records: create a consent record (vault-rw).
  • POST /v1/consent-records/{id}/revoke: revoke a consent record (vault-rw, no body).
  • POST /v1/external-releases: record an authorized external release. The route checks the privacy threshold, confirms the required signatures from the configured roster and reads the live consent state before persisting.
  • GET /v1/external-releases: public listing for community audit, no auth.
  • POST /v1/traditional-knowledge-events: record a custodian's pseudonymized traditional knowledge entry (Stage 1.5). The consent gate is the same as for catch events.
  • POST /v1/community-voice-events: record a civic incident report (oil spill, illegal fishing, etc.) with optional escalation opt-in (Stage 1.5). The app-role GET strips the reporter identity; the vault-role GET includes it.

Also included: GET / for metadata, GET /health for readiness, GET /docs for OpenAPI and GET /openapi.json.

How to run locally

Prerequisites: Docker + Docker Compose, uv (Python package manager).

cp .env.example .env
# Edit .env and replace every "replace_me_*" placeholder with a real value.

uv lock                       # generate uv.lock (committed for reproducibility)
docker-compose up --build     # starts mongo + api

Open http://localhost:8000/docs for the OpenAPI explorer.

Validate readiness: curl http://localhost:8000/health should return {"mongo_app": "ok", "mongo_vault": "ok"} with HTTP 200.

How to run tests

Tests run against a real Mongo instance. The validators, indexes and role separation are the same as in production.

docker-compose up mongo -d    # start Mongo only, in background
uv sync --group dev           # install pytest + httpx
uv run pytest

The suite under tests/ covers the sovereignty edge cases (revoked consent blocking a release, insufficient quorum blocking, k<5 blocking, unauthorized category blocking a catch event) along with idempotency, audit emission, the dual-view role separation for community voice and the Stage 2 identity dual-write.

Architectural decisions

Role separation at the Mongo level

Two Mongo users with distinct roles, both connecting to the same DB:

  • benthos_app (used by catch and release endpoints): find + insert on catch_events and external_release_logs (these collections are append-only), and find only on consent_records. The fisher_identity_map is kept outside this role.
  • benthos_vault (used by consent endpoints and identity helpers): find, insert and update on consent_records and fisher_identity_map. The catch and release tables stay outside this role.

The API opens two distinct Mongo clients via MONGO_APP_URI / MONGO_VAULT_URI; routes declare which one they need through FastAPI dependencies (benthos_api/deps.py). Even if the application is fully compromised, the pseudo↔real identity mapping stays inaccessible to app code.

Append-only logs

catch_events and external_release_logs accept insert only. Mongo roles enforce that at the database layer, the API exposes only write-once routes, and the release log stays the public auditable trail of every external data flow with its immutability guaranteed by the architecture itself.

Defense in depth: Pydantic + $jsonSchema

Every model is validated twice on the way in:

  • Once by Pydantic at the API boundary, with extra="forbid" and cross-field validators (e.g. privacy_method must match privacy_params).
  • Once by Mongo's $jsonSchema on insert.

Enum string values stay identical letter-by-letter between benthos_api/models/common.py and mongo-init.js. Pydantic handles the cross-field rules that $jsonSchema can't express, while structural shape and required fields are enforced at both layers.

Schema versioning

Every document carries schema_version: "1.0.0". Migrations aren't implemented in Stage 1. See Limitations for details.

CARE Principles, Stage 1 pragmatic subset

ConsentRecord requires the 13 fields most relevant to Stage 1 operations (community, scope, categories, custodian, prior consent date, retention, withdrawal mechanism, etc.). The remaining CARE-MANDATORY fields from the source planilha (ethical_framework_source, relationship_custodian_name, conflict_resolution_mechanism, benefit_sharing_mechanism, local_language_primary) are accepted as optional and admit null.

CPUE server-computed; catch events idempotent

cpue_kg_per_hour = catch_weight_kg / fishing_effort_hours is computed by the server on insert. Clients cannot supply it (extra="forbid" rejects the field).

Catch events are idempotent on (community, fisher_pseudo_id, event_date, catch_species) via the unique index uq_catch_event_natural_key. Duplicate POSTs return HTTP 409.

Quorum lives in configuration

RELEASE_QUORUM_REQUIRED_ROLES (env var, comma-separated) lists the roles whose signatures are required to authorize an external release. The Stage 1 default is committee_chair,data_steward. Bumping to a third role at committee maturity takes a config change, no code change required. The release endpoint validates that every required role is present in quorum_signatures (Pydantic enforces distinctness).

Schema evolution: mongo-init.js vs migrations/

mongo-init.js runs once, only on the first container init (when /data/db is empty). It defines the target schema state for a fresh deployment. Editing mongo-init.js after the volume already exists has no effect on the running database.

For schema evolution after first init, the migrations/ directory holds versioned, idempotent migration scripts. Each script captures the delta from one state to the next and stays immutable after approval. For subsequent schema changes, write a new migration script to preserve the audit chain.

Apply migration 001 (the Stage 1.5 migration, which adds traditional_knowledge_events and community_voice_events collections and extends the consent_records.approved_categories and audit_log enums):

docker compose exec -T mongo mongosh \
  -u "$MONGO_INITDB_ROOT_USERNAME" -p "$MONGO_INITDB_ROOT_PASSWORD" \
  --authenticationDatabase admin \
  < migrations/001_stage_1_5_tk_and_community_voice.js

Re-running the same migration is a no-op. The script guards createCollection with getCollectionNames and lets collMod reapply the existing validator, while index creation and privilege grants handle their own deduplication natively.

Rollback (manual, safe only before any Stage 1.5 inserts):

  • Drop the 2 new collections using the root admin client.
  • Prior consent_records and audit_log validators can be restored from a pre-migration mongodump, or from a clean re-init using a Stage 1 mongo-init.js.
  • Finish by revoking the 3 new privileges via revokePrivilegesFromRole.

The migration file header documents the exact rollback commands. For production, snapshot the prior validator before applying so the rollback is mechanical.

Limitations (Stage 1)

These are the Stage 1 limits for now. Each one has a follow-up below.

  • Schema migrations aren't implemented yet. Subsequent schema changes need an explicit migration plan, which is on the agenda for Stage 2.
  • assert_k_anonymity validates the declared k value. Record-level equivalence class checks land with the Stage 4 aggregation pipeline, since the API is the audit-trail layer and actual aggregation happens upstream in the data steward's pipeline.
  • field_setDataCategory granular mapping isn't enforced yet. Stage 1 only checks consent_scope != internal_only to authorize external release. Stage 4 adds field-to-category enforcement.
  • Auth, CORS, rate-limiting and structured observability are out of scope for Stage 1. The first ship was the data integrity layer. Security and operability hardening land in Stage 2.
  • fisher_identity_map exists but no REST endpoint populates it yet. The collection has vault-only access. Helpers in benthos_api/privacy.py (make_pseudo_id) mint the pseudonyms. The population flow lives outside this service in Stage 1.

Follow-ups

In priority order, for the next phase:

  1. Add ruff and mypy to the dev dependency group in pyproject.toml.
  2. Refactor config.py to build Mongo URIs from parts (MONGO_HOST, MONGO_PORT, MONGO_USER, MONGO_PASSWORD) instead of taking whole URIs as env vars. This eliminates password drift between the MONGO_*_PASSWORD field in .env.example and the password embedded in the URI.
  3. Implement an explicit schema migration plan for transitions beyond schema_version: "1.0.0" (Stage 2+).
  4. Pass PRIVACY_K_ANONYMITY_MIN and PRIVACY_DP_EPSILON_MAX into mongo-init.js so the $jsonSchema validators enforce the same thresholds at the database layer.
  5. Validate temporal precision of event_date in routes/catch.py so morning and afternoon catches of the same species don't collide on uq_catch_event_natural_key.
  6. Address the assert_k_anonymity and field_setDataCategory mapping limitations documented above (Stage 4).

Acceptance criteria (Stage 1)

Stage 1 is considered complete when each of the following is verified by hand:

  • docker-compose up starts mongo + api with no initialization errors.
  • curl http://localhost:8000/docs shows the OpenAPI docs for the endpoints listed here.
  • uv run pytest passes green, covering idempotency and the sovereignty scenarios.
  • Someone new to the project can follow this README and get the service running locally in under 10 minutes.
  • No secrets, real fisher/community/consent data or references to non-mocked external resources appear in the committed code.

Single-vendor open source

The codebase is maintained by a small team at Benthos gGmbH and we answer issues as capacity allows. PRs are welcome but acceptance depends on alignment with our roadmap. Deployment work stays focused on the territories we actively support; for territorial deployments, please contact info@benthos.ai.

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

Copyright (C) 2026 Benthos gGmbH.

AGPL-3.0 is a strong copyleft license for network software. Anyone running a modified version of Benthos as a service has to make that source code available under the same license to people interacting with it. This keeps the consent grammar open, preserves an inspectable audit trail and makes it harder for the privacy gates to disappear into proprietary forks. It's also how we honor § 12 of the Benthos Satzung: GDPR, responsible AI, preservation of local knowledge, fair cooperation.

The full license text is in LICENSE.

Territorial deployments built on top of this infrastructure (community-specific configuration, local glossary, data) stay private. Community data is never open-sourced.

About

FastAPI service for data sovereignty in coastal community work. Consent grammar, audit log and privacy-gated external release.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors