Cedarling Telemetry RFC

Cedarling Telemetry

This document defines the telemetry data that a Cedarling instance can collect and send to the Lock Server via the /audit/telemetry/bulk endpoint. Telemetry is organized into three maps within the TelemetryEntry message: policy_stats, error_counters, and operational_stats.

All counters are reset to zero at the start of each telemetry interval. Gauge-type values (marked with gauge) reflect a point-in-time snapshot taken at flush time.

Proto definition

message TelemetryEntry {
  google.protobuf.Timestamp creation_date = 1;
  string service = 3;
  string node_name = 4;
  string status = 5;
  // fields 6-12 removed: migrated to operational_stats map for extensibility
  map<string, int64> policy_stats = 13;
  map<string, int64> error_counters = 14;
  map<string, int64> operational_stats = 15;
  int64 interval_secs = 16;  // the collection period this entry covers
}

Field notes

creation_date — Timestamp when the telemetry entry was created and sent. For telemetry this represents the end of the collection window (the start can be derived as creation_date - interval_secs). The previous event_time field has been removed because for telemetry there is no single "event" — the entry covers a time range, not a point in time. The interval_secs field together with creation_date fully defines the collection window.

Note: The existing LogEntry and HealthEntry messages in audit.proto retain both creation_date and event_time for backwards compatibility. For those message types, both fields are currently set to the same value.

`interval_secs` field

The interval_secs field indicates the duration (in seconds) of the collection window that this TelemetryEntry covers. All counter values in error_counters, operational_stats, and policy_stats represent totals accumulated over this period. The receiver needs this value to compute rates (e.g., requests per second = authz.requests_total / interval_secs).

The value comes from the CEDARLING_LOCK_TELEMETRY_INTERVAL bootstrap property, which controls both the collection period and the send interval. Under normal operation, interval_secs equals the configured interval. It may differ if the instance was just started (first partial interval) or is shutting down (final flush before exit).

Bootstrap configuration

The existing CEDARLING_LOCK_TELEMETRY_INTERVAL property (in seconds, 0 = disabled) currently controls only the send interval. With telemetry implemented, it also defines the collection window — counters are reset after each send.

Property	Type	Default	Description
`CEDARLING_LOCK_TELEMETRY_INTERVAL`	`u64`	`0` (disabled)	How often (in seconds) to collect, reset, and send telemetry to the Lock Server. Value of `0` disables telemetry entirely. Recommended: `60`

Note: Currently there is no separate property for the collection period vs. the send interval — they are always equal. If a use case arises where finer-grained collection is needed (e.g., collect every 10s for more accurate percentiles but send every 60s), a new property CEDARLING_LOCK_TELEMETRY_COLLECTION_INTERVAL could be introduced. Until then, CEDARLING_LOCK_TELEMETRY_INTERVAL serves both purposes.

REST API example

The telemetry REST endpoint follows the same pattern as the existing log endpoint: a POST with a JSON array body to the /bulk sub-path. The JSON field names match the proto field names using snake_case (same convention as LogEntry / HealthEntry).

Request:

POST /jans-auth/v1/audit/telemetry/bulk HTTP/1.1
Host: lock.example.com
Authorization: Bearer <access_token>
Content-Type: application/json

[
  {
    "creation_date": "2026-04-02T10:05:00Z",
    "service": "my_app",
    "node_name": "019615a1-d7c0-7fc0-8000-deadbeef1234",
    "status": "running",
    "interval_secs": 60,
    "policy_stats": {
      "allow_read_documents": 340,
      "deny_admin_access": 12,
      "allow_user_update": 88
    },
    "error_counters": {
      "jwt.validation_failed": 8,
      "jwt.untrusted_issuer": 2,
      "authz.entity_build": 3,
      "data.storage_limit": 1
    },
    "operational_stats": {
      "authz.requests_total": 1240,
      "authz.requests_unsigned": 200,
      "authz.requests_multi_issuer": 1040,
      "authz.decision_allow": 1218,
      "authz.decision_deny": 8,
      "authz.errors_total": 14,
      "authz.last_eval_time_us": 38,
      "authz.eval_time_p50_us": 35,
      "authz.eval_time_p95_us": 120,
      "authz.eval_time_p99_us": 450,
      "authz.eval_time_max_us": 1200,
      "authz.principals_per_request_avg": 2,
      "token_cache.hits": 890,
      "token_cache.misses": 350,
      "token_cache.size": 128,
      "token_cache.evictions": 12,
      "jwt.validations_total": 1240,
      "jwt.validations_success": 1227,
      "jwt.validations_failed": 13,
      "jwt.tokens_skipped_untrusted": 2,
      "data.entries_count": 45,
      "data.total_size_bytes": 12800,
      "data.push_ops": 30,
      "data.get_ops": 1040,
      "data.remove_ops": 5,
      "data.ttl_expirations": 8,
      "data.memory_alert_triggered": 0,
      "lock.batches_sent": 6,
      "lock.entries_sent": 1240,
      "lock.retries": 0,
      "lock.queue_depth": 0,
      "instance.uptime_secs": 3600,
      "instance.memory_usage_bytes": 52428800,
      "instance.policy_count": 15,
      "instance.trusted_issuers_loaded": 3,
      "instance.trusted_issuers_failed": 0
    }
  }
]

Response (success):

{
  "success": true,
  "message": ""
}

Notes:

The endpoint URL is derived from the Lock Server's .well-known/lock-server-configuration response (audit.telemetry_endpoint), with /bulk appended automatically — same as for the log endpoint.
The Authorization header uses the same Bearer token obtained during Dynamic Client Registration (scope: https://jans.io/oauth/lock/telemetry.write).
The body is always a JSON array (even with a single entry) to match the BulkTelemetryRequest proto pattern.
Empty maps may be omitted or sent as {}. Zero-value counters may be omitted to reduce payload size.

Migration from previous proto fields

The following dedicated fields (6-12) have been replaced by keys in the operational_stats map. This makes the protocol extensible — adding a new metric requires only a new map key, not a proto change.

Old field	Old number	New location	Reason
`last_policy_load_size`	6	`operational_stats["instance.policy_count"]`	Gauge metric, fits naturally in the map
`policy_success_load_counter`	7	`operational_stats["instance.trusted_issuers_loaded"]`	Duplicate concept, consolidated
`policy_failed_load_counter`	8	`operational_stats["instance.trusted_issuers_failed"]`	Duplicate concept, consolidated
`last_policy_evaluation_time_ns`	9	`operational_stats["authz.last_eval_time_us"]`	Moved to latency section (units changed to microseconds)
`avg_policy_evaluation_time_ns`	10	Removed — replaced by percentiles	Averages hide outliers. Replaced by `authz.eval_time_p50_us`, `authz.eval_time_p95_us`, `authz.eval_time_p99_us`
`memory_usage`	11	`operational_stats["instance.memory_usage_bytes"]`	Gauge metric, fits naturally in the map
`evaluation_requests_count`	12	`operational_stats["authz.requests_total"]`	Counter metric, consolidated with other authz stats

`policy_stats`

Per-policy evaluation counts. Each key is a policy ID from the policy store, and the value is the number of times that policy was referenced in a Cedar reason (i.e., contributed to an authorization decision) during the interval.

Key pattern	Value	Description
`<policy_id>`	count	Number of times this policy appeared in the Cedar diagnostics `reason` set during the interval

Example:

{
  "allow_read_documents": 340,
  "deny_admin_access": 12,
  "allow_user_update": 88
}

`error_counters`

Classification counters for errors encountered during the telemetry interval. Each key identifies a specific error variant and the value is the number of occurrences.

JWT validation errors

Source: ValidateJwtError, JwtProcessingError

Key	Source	Description
`jwt.decode_failed`	`ValidateJwtError::DecodeJwt`	JWT is malformed: not in `header.payload.signature` format, base64 decode failure, or header/claims JSON deserialization failure
`jwt.missing_key`	`ValidateJwtError::MissingValidationKey`	No decoding key available for this JWT's `kid`. JWKS may not have been fetched or the key was rotated
`jwt.missing_validator`	`ValidateJwtError::MissingValidator`	No validator initialized for this issuer+algorithm combination. Either the issuer is untrusted or the algorithm is not in `signature_algorithms_supported`
`jwt.validation_failed`	`ValidateJwtError::ValidateJwt`	Signature verification or standard claim validation failed (expired, wrong audience, `nbf` in future, etc.)
`jwt.missing_claims`	`ValidateJwtError::MissingClaims`	Token is missing required claims defined per token type in the trusted issuer config (e.g. `sub`, `iss`, `aud`)
`jwt.status_check_failed`	`ValidateJwtError::GetJwtStatus`	Failed to fetch or parse the JWT status list reference
`jwt.status_rejected`	`ValidateJwtError::RejectJwtStatus`	Token was revoked or suspended according to the IETF status list
`jwt.missing_status_list`	`ValidateJwtError::MissingStatusList`	Status validation is enabled but no status list is available for this token
`jwt.untrusted_issuer`	`ValidateJwtError::TrustedIssuerValidation(UntrustedIssuer)`	Token's `iss` claim does not match any trusted issuer in the policy store
`jwt.missing_required_claim`	`ValidateJwtError::TrustedIssuerValidation(MissingRequiredClaim)`	Trusted issuer requires a specific claim that the token does not have
`jwt.signed_authz_unavailable`	`JwtProcessingError::SignedAuthzUnavailable`	Signed authorization requested but no trusted issuers or JWKS were configured

Authorization errors

Source: AuthorizeError

Key	Source	Description
`authz.invalid_action`	`AuthorizeError::Action`	The action string could not be parsed as a valid Cedar `EntityUid`
`authz.identifier_parsing`	`AuthorizeError::IdentifierParsing`	An entity type name or action identifier failed to parse
`authz.invalid_context`	`AuthorizeError::CreateContext`	Context JSON does not conform to the Cedar schema for this action
`authz.invalid_principal`	`AuthorizeError::InvalidPrincipal`	The Cedar request for a principal does not conform to the schema
`authz.request_validation`	`AuthorizeError::RequestValidation`	Cedar request validation failed (schema mismatch between action, principal, resource)
`authz.entity_validation`	`AuthorizeError::ValidateEntities`	Built entities violate the Cedar schema constraints
`authz.entity_build`	`AuthorizeError::BuildEntity`	Failed to construct a Cedar entity: missing entity ID, invalid UID format, attribute evaluation failure, or entity type not in schema
`authz.context_build`	`AuthorizeError::BuildContext`	Context merging failed: key conflict between request context and pushed data, unknown action in schema, or missing entity reference
`authz.rule_execution`	`AuthorizeError::ExecuteRule`	Failed to apply `principal_bool_operator` aggregation rule across principals
`authz.unsigned_role_build`	`AuthorizeError::BuildUnsignedRoleEntity`	Role field in unsigned request is not a string or array of strings

Multi-issuer validation errors

Source: MultiIssuerValidationError, MultiIssuerEntityError

Key	Source	Description
`multi_issuer.empty_token_array`	`MultiIssuerValidationError::EmptyTokenArray`	The `tokens` array in the request is empty
`multi_issuer.token_input_invalid`	`MultiIssuerValidationError::TokenInput`	Token input has empty mapping string or empty payload
`multi_issuer.all_tokens_failed`	`MultiIssuerValidationError::TokenValidationFailed`	All tokens in the request failed validation, no valid tokens to evaluate
`multi_issuer.invalid_context`	`MultiIssuerValidationError::InvalidContextJson`	The optional context field is not valid JSON
`multi_issuer.missing_issuer`	`MultiIssuerValidationError::MissingIssuer`	A JWT is missing the `iss` claim, cannot determine trusted issuer
`multi_issuer.entity_build`	`AuthorizeError::MultiIssuerEntity`	Entity construction failed for multi-issuer flow (missing `exp`, invalid UID, no valid tokens, attribute build failure)

Data store errors

Source: DataError

Key	Source	Description
`data.invalid_key`	`DataError::InvalidKey`	Push or get called with an empty key
`data.key_not_found`	`DataError::KeyNotFound`	Requested key does not exist in the data store
`data.storage_limit`	`DataError::StorageLimitExceeded`	`max_entries` reached, cannot push more data
`data.ttl_exceeded`	`DataError::TTLExceeded`	Requested TTL is larger than `max_ttl` configured
`data.value_too_large`	`DataError::ValueTooLarge`	Entry size exceeds `max_entry_size`
`data.serialization`	`DataError::SerializationError`	JSON serialization of the value failed

Lock service transport errors

Source: LogWorker, RestTransport

Key	Source	Description
`lock.send_failed`	`LogWorker::flush_logs` retry path	Failed to send log batch to Lock Server (network error, 4xx/5xx response)
`lock.channel_full`	`LockService::log_any` `try_send` error	The mpsc channel to `LogWorker` is full, logs are being produced faster than sent
`lock.malformed_entry`	`RestTransport::send_logs` skip	Log entry could not be deserialized or mapped to Lock Server format

`operational_stats`

Operational metrics collected during the telemetry interval. Values are either counters (reset each interval) or gauges (point-in-time snapshot at flush time).

Authorization decisions

Key	Type	Description
`authz.requests_total`	counter	Total number of authorization requests (`authorize_unsigned` + `authorize_multi_issuer`)
`authz.requests_unsigned`	counter	Number of `authorize_unsigned` calls
`authz.requests_multi_issuer`	counter	Number of `authorize_multi_issuer` calls
`authz.decision_allow`	counter	Number of requests that resulted in ALLOW
`authz.decision_deny`	counter	Number of requests that resulted in DENY
`authz.errors_total`	counter	Total number of requests that returned an error (did not reach a decision)

Authorization latency

Latency is reported using percentiles instead of averages. Averages hide outliers — a single slow request can go unnoticed if averaged with thousands of fast ones. Percentiles give a full distribution picture: P50 shows typical latency, P95 shows degraded experience for tail users, P99/max detect spikes and worst-case scenarios.

Key	Type	Description
`authz.last_eval_time_us`	gauge	Last policy evaluation time in microseconds
`authz.eval_time_p50_us`	gauge	Median (50th percentile) evaluation time in microseconds — typical request latency
`authz.eval_time_p95_us`	gauge	95th percentile evaluation time — latency experienced by the slow tail
`authz.eval_time_p99_us`	gauge	99th percentile evaluation time — worst-case excluding extreme outliers
`authz.eval_time_max_us`	gauge	Maximum evaluation time in the interval — detects spikes
`authz.principals_per_request_avg`	gauge	Average number of principals evaluated per unsigned request

Token cache

Key	Type	Description
`token_cache.hits`	counter	Number of cache hits (token reused without re-validation)
`token_cache.misses`	counter	Number of cache misses (full validation required)
`token_cache.size`	gauge	Current number of entries in the token cache
`token_cache.evictions`	counter	Number of entries evicted (TTL expiry or capacity)

JWT validation

Key	Type	Description
`jwt.validations_total`	counter	Total number of individual JWT validations attempted
`jwt.validations_success`	counter	Number of JWTs that passed validation
`jwt.validations_failed`	counter	Number of JWTs that failed validation (sum should match individual `error_counters` `jwt.*` keys)
`jwt.tokens_skipped_untrusted`	counter	Tokens skipped in multi-issuer flow because issuer was not trusted (warning, not error)

Data store

Key	Type	Description
`data.entries_count`	gauge	Current number of entries in the pushed data store
`data.total_size_bytes`	gauge	Total memory used by data store entries
`data.push_ops`	counter	Number of `push_data_ctx` calls
`data.get_ops`	counter	Number of `get_data_ctx` calls
`data.remove_ops`	counter	Number of `remove_data_ctx` calls
`data.ttl_expirations`	counter	Number of entries that expired due to TTL
`data.memory_alert_triggered`	gauge	1 if memory alert threshold was crossed during the interval, 0 otherwise

Lock service transport

Key	Type	Description
`lock.batches_sent`	counter	Number of log batches successfully sent to Lock Server
`lock.entries_sent`	counter	Total number of log entries sent
`lock.retries`	counter	Number of retry attempts for failed sends
`lock.queue_depth`	gauge	Current number of unsent entries in the log buffer

Instance

Key	Type	Description
`instance.uptime_secs`	gauge	Seconds since Cedarling instance was initialized
`instance.memory_usage_bytes`	gauge	Process memory usage in bytes (RSS)
`instance.policy_count`	gauge	Number of policies in the loaded policy store
`instance.trusted_issuers_loaded`	gauge	Number of successfully loaded trusted issuers
`instance.trusted_issuers_failed`	gauge	Number of trusted issuers that failed to load

Cedarling Telemetry RFC

Cedarling Telemetry

Proto definition

Field notes

interval_secs field

Bootstrap configuration

REST API example

Migration from previous proto fields

policy_stats

error_counters

JWT validation errors

Authorization errors

Multi-issuer validation errors

Data store errors

Lock service transport errors

operational_stats

Authorization decisions

Authorization latency

Token cache

JWT validation

Data store

Lock service transport

Instance

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

`interval_secs` field

`policy_stats`

`error_counters`

`operational_stats`