Skip to content

✨ qontract-api: events framework#5420

Merged
chassing merged 22 commits intoapp-sre:masterfrom
chassing:APPSRE-13196/Log-events-to-sd-app-sre-reconcile
Mar 2, 2026
Merged

✨ qontract-api: events framework#5420
chassing merged 22 commits intoapp-sre:masterfrom
chassing:APPSRE-13196/Log-events-to-sd-app-sre-reconcile

Conversation

@chassing
Copy link
Copy Markdown
Member

This PR introduces a generic event system that allows qontract-api to publish events when integrations perform actions (e.g., slack-usergroups changes members). Events are consumed by reconcile integrations that can react to these changes.

The event API lives in qontract-utils with Protocol-based interfaces and a factory pattern, making backends pluggable. The first backend uses Redis Streams, leveraging the existing Redis infrastructure without requiring additional services.

On the producer side, qontract-api gains an EventManager (similar to SecretManager) that celery tasks use to publish events. On the consumer side, a new event-log-sink reconcile integration reads events from a Redis Stream and logs them to stdout (#sd-app-sre-reconcile).

The consumer follows standard message queue semantics: unacknowledged events are re-delivered on the next read, and the caller controls acknowledgment via a parameter on receive(). This enables safe dry-run behavior where events are displayed but not consumed.

An ADR (ADR-018) documents the architectural decision and patterns.

Ticket: APPSRE-13196

@chassing chassing self-assigned this Feb 11, 2026
Comment thread docs/adr/ADR-018-event-driven-communication.md Outdated
@chassing chassing force-pushed the APPSRE-13196/Log-events-to-sd-app-sre-reconcile branch 4 times, most recently from 5edb9fb to eda4bc8 Compare February 26, 2026 11:07
Copy link
Copy Markdown
Contributor

@hemslo hemslo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

chassing added 22 commits March 2, 2026 08:52
…ication fields

- Add notification_channel field (str | None, default None)
- Add notification_workspace field (str | None, default None)
- Add secret_path field (str, default 'app-sre/slack/bot-token')
- Add 5 comprehensive tests for new fields
- Maintain backwards compatibility with existing deployments
- Add ChatPostMessageResponse model to models.py with ts, channel, thread_ts fields
- Add 7 test functions covering success, thread reply, auto-join, error handling, truncation, and hooks
- All tests fail as expected (RED state) because chat_post_message method does not exist yet
- Add chat_post_message method to SlackApi with @invoke_with_hooks decorator
- Auto-join logic handles not_in_channel error with conversations_join + retry
- channel_not_found logged at ERROR level and re-raised
- Other errors logged at WARNING level and re-raised
- Text truncation at 10,000 characters with "... [truncated]" suffix
- Export ChatPostMessageResponse from __init__.py
- All tests pass (7 new + 23 existing)
- Extract magic value 10000 to MAX_MESSAGE_LENGTH constant
- Use logger.exception instead of logger.error when re-raising
- Add type checking for response.data to satisfy mypy
- Fix exception chaining with 'from None'
- Update test to check logger.exception instead of logger.error
- All tests pass, ruff check/format pass, mypy passes
…tory function

- Relocate SlackWorkspaceClient to qontract_api/slack/
- Add chat_post_message method to SlackWorkspaceClient
- Create create_slack_workspace_client factory function following PagerDuty pattern
- Factory resolves Secret via SecretManager
- Convert old modules to re-export shims for backward compatibility
- Update service.py to use create_slack_workspace_client function
- Update tasks.py to pass cache directly to service
- Migrate test imports to new location
- Add new tests for factory function and chat_post_message
- Use lazy imports to avoid circular dependencies
- All 43 tests passing
- Create frozen Pydantic request/response models with Secret, icon, and username fields
- Implement POST /chat router with ValueError (404) and SlackApiError (502) handling
- Add channel name → ID resolution with # prefix stripping in SlackWorkspaceClient
- Rename SlackApi.chat_post_message channel → channel_id for type clarity
- Add icon_emoji, icon_url, username params through the full call chain
- Replace msg_kwargs dict with explicit typed parameters for mypy safety
- Improve general_exception_handler to preserve HTTPException status/detail
- Register slack router in api_v1
- Add 5 endpoint tests + 2 workspace client tests (channel resolution, hash prefix)
…ettings fields

- Add SubscriberSettings model with required slack_channel, slack_workspace, slack_token_path
- Add optional qontract_api_url (defaults to http://localhost:8000) and qontract_api_token_path
- Add Settings.subscriber field (defaults to None for backward compatibility)
- Remove deprecated notification_channel, notification_workspace, secret_path from SlackSettings
- Add comprehensive tests for SubscriberSettings and Settings.subscriber
…metrics, and tests

- Add EventFormatter protocol for type-safe formatter registration
- Implement GenericEventFormatter with emoji mapping (error, fail, create, update, delete)
- Format events as human-readable Slack messages with emoji, type, source, and JSON data dump
- Create formatter registry with format_event and register_formatter functions
- Add Prometheus metrics: events_received, events_posted, events_failed counters and event_processing_duration histogram
- Add comprehensive tests for formatter registry and generic formatter with complex data
- Fix ClassVar annotation for EMOJI_MAP to satisfy ruff linter
- Add qontract_api_token field to SubscriberSettings for direct token auth
- Create _client.py with _get_client() and post_to_slack() functions
- Rewrite _subscriptions.py event_handler with per-event error isolation
- Increment Prometheus metrics for received, posted, failed events
- Record event processing duration in histogram
- Create conftest.py with sample_event and error_event fixtures
- Add test_subscriptions.py with 8 test cases covering:
  * Event processing flow (format + post)
  * Per-event error isolation (SUB-02)
  * Prometheus metrics (received, posted, failed, duration)
  * Exception handling in both format and post stages
- All 16 subscriber tests pass
@chassing chassing force-pushed the APPSRE-13196/Log-events-to-sd-app-sre-reconcile branch from eda4bc8 to 9c57789 Compare March 2, 2026 07:55
@chassing chassing merged commit 9701ef6 into app-sre:master Mar 2, 2026
9 checks passed
@chassing chassing deleted the APPSRE-13196/Log-events-to-sd-app-sre-reconcile branch March 11, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants