feat(teams): persist aadGroupId to Mongo for parity with other platforms#211
Merged
Conversation
Other platforms (Slack/Discord/Mattermost) bootstrap channel listings
from a bot token stored in Mongo — identity survives bot restart and
Redis loss without an inbound webhook. Teams was the outlier: there is
no app-only Microsoft Graph endpoint to enumerate "teams this app is
installed in", so the team's AAD group id was observed from Bot
Framework activities and cached ONLY in the chat-adapter's Redis
(`chat-sdk:cache:teams:channelContext:*`). A Redis container restart or
30-day cache TTL erased that identity and the Teams workspace vanished
from the sidebar until the next inbound webhook reseeded it.
This change makes Teams self-bootstrap exactly like the others:
1. Pydantic model: new `teams_known_team_ids: list[str]` on
`PlatformConnection` (default `[]`). No DB migration — legacy docs
decode through the default.
2. Store: `add_teams_known_team_id` upserts via `$addToSet` so
concurrent writes from multiple webhook deliveries can't double-
insert without holding a lock.
3. Internal API: the existing
`GET /api/internal/connections/credentials` now returns
`teams_known_team_ids` so the bot's startup loader sees them. A new
`POST /api/internal/connections/{id}/teams-known-team-ids` accepts
a validated AAD group GUID and persists it. Both behind the
existing `require_bridge` gate.
4. Bot bridge: `recordTeamsConversation` now fires a fire-and-forget
POST whenever a NEW aadGroupId arrives for a connection (dedup
in-memory before queueing). Backend dedups again via `$addToSet`.
5. Bot startup: `syncConnectionsFromBackend` calls a new exported
`seedTeamsKnownTeamIds(connId, ids)` after registering each Teams
adapter, hydrating the in-memory `teamsKnownTeamIds` Map straight
from Mongo. `seedTeamsKnownTeamIds` also flips
`teamsColdStartScanned` for that connection so the Redis scan path
can't race the hydrated state.
Result: a `docker compose down -v && up -d` (or any Redis/bot restart)
returns the Teams workspace + channels to the sidebar on first
listChannels — no webhook required. Existing connections benefit on
their FIRST inbound activity after deploy: the write-through fires,
Mongo gets the team-id, every subsequent restart hydrates cleanly.
Verified:
- 9/9 new bot unit tests (bridge.teams-persistence.test.ts): seed
hydrates / no-op on empty / per-connection scoped / idempotent;
write-through POSTs once / dedups same id / re-fires different id /
rejects malformed GUID / no-op when aadGroupId absent.
- 88/88 bot bridge tests pass (was 79).
- 14/14 platform_store tests pass: default `[]`, round-trip preserves
ids, legacy docs decode, `$addToSet` operator + `updated_at` touch
asserted, returns None on missing connection.
- bot lint: 0 errors. tsc --noEmit clean.
- Python ruff clean on all three changed files.
Constraint: Bot Framework hands the team identity via webhooks only;
no app-only Graph endpoint for "list installed teams" exists.
Constraint: The fire-and-forget POST must not block webhook
processing — 5s timeout + caught errors + dedup.
Rejected: Reading the chat-adapter Redis cache as primary source |
cache TTL (30d) and restart-fragility was the original bug.
Rejected: Migrating PlatformConnection schema | Pydantic default `[]`
covers legacy rows, no down-revision needed.
Confidence: high
Scope-risk: narrow — additive (model field, one store method, one new
endpoint, one bridge export, one startup seed call).
Directive: `seedTeamsKnownTeamIds` MUST also flip
`teamsColdStartScanned[connectionId]` so a stale Redis cache can't
race the hydrated Map state. Do not separate these without re-
testing the restart drill.
Not-tested: Live `docker compose down -v && up -d` drill — requires
rebuilt bot + backend images on a live stack; deferred to merge
validation by the operator. The unit test surface covers the
helpers; the smoke test plan is in the PR body.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cold-start Redis scan in `resolveTeamIds` pre-populates the in-memory `teamsKnownTeamIds` Map when an existing connection has channelContext entries cached from a prior run. After that, every subsequent webhook short-circuits the write-through in `recordTeamsConversation` (because its dedup checks the in-memory state), so an existing connection upgraded to this PR would never seed `teams_known_team_ids` in Mongo and bot-restart-survives-Redis- wipe never engages. Fix: when the cold-start scan adds a NEW id to the Map, also fire the fire-and-forget write-through. Backend `$addToSet` keeps it idempotent. The new-connection path is unchanged — first webhook still seeds Mongo via the existing `recordTeamsConversation` branch. Verified: - 88/88 bridge tests still pass (no behavior change for new connections). - tsc --noEmit clean. Confidence: high Scope-risk: narrow — single block inside the existing scan loop. Directive: keep the in-memory `teamSet.add(tId)` AFTER the `wasNew` check; reversing the order silently never persists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two CI fixes surfaced by the first full-suite run on this branch (it only got smoke tests while stacked on #206): - ruff format: connections.py had a 3-line wrap ruff collapses to 1 - CodeQL js/tainted-format-string (HIGH): the persist-failure warn passed an interpolated template string as console.warn's first arg alongside a second arg — Node treats arg[0] as a printf format string when more args follow, so a connectionId containing %s/%j would hijack substitution. Switched to the constant-format-string + %s args pattern already used elsewhere in this file (e.g. connection route error logging). Confidence: high Scope-risk: narrow Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
In the screenshot the user reported, every other platform shows up in the sidebar (Discord 4/13, Mattermost 1/11, Slack 1/2, Discord-TRIP 1/3) but the Teams workspace is gone, and the Manage Channels modal for Teams shows "No channels found". Investigation showed it's not a UI filter at all —
listChannelsreturned[]because the bot had noaadGroupIdto query Microsoft Graph with.listChannelssourceconversations.list/users/@me/guildsthen/guilds/{id}/channels/users/me/teams//channelsaadGroupIdobserved from Bot Framework webhook, cached in chat-adapter Redis onlyaadGroupIdobserved from webhook, persisted to Mongo via fire-and-forget POST; bot startup seeds the in-memory Map from MongoPR #207 (Redis AOF + persistent volume) reduces the failure window for dev compose but doesn't fix the asymmetry: prod managed Redis already has AOF, and we'd still fail on a TTL expiry, an adapter rebuild before the next webhook, or an explicit cache flush. Mongo persistence is the actual parity fix.
What
PlatformConnection.teams_known_team_ids: list[str], default[]). No DB migration — legacy docs decode through the default.add_teams_known_team_id(connection_id, aad_group_id)upserts via$addToSetso concurrent webhooks can't double-insert.GET /api/internal/connections/credentialsnow returnsteams_known_team_idsso the bot's startup loader sees it.POST /api/internal/connections/{id}/teams-known-team-idsaccepts a validated AAD group GUID, 404s on unknown connection, 422 when the connection isn't a Teams row, 400 on a malformed GUID. Behind the existingrequire_bridgegate.bot/src/bridge.ts):seedTeamsKnownTeamIds(connId, ids[])for startup hydration. Also flipsteamsColdStartScanned[connId]so a stale Redis cache can't race the hydrated state.recordTeamsConversationfires a fire-and-forget POST whenever a NEWaadGroupIdarrives (dedup in-memory before queueing). Backend dedups again via$addToSet. 5s timeout, swallowed errors so a slow backend can't block the webhook handler.bot/src/index.ts):syncConnectionsFromBackendcallsseedTeamsKnownTeamIdsafter registering each Teams adapter.Test plan
bridge.teams-persistence.test.ts): seed hydrates / no-op on empty / per-connection scoped / idempotent; write-through POSTs once / dedups same id / re-fires different id / rejects malformed GUID / no-op whenaadGroupIdabsent.tests/test_platform_store.pytests pass: default[], round-trip preserves ids, legacy docs decode,$addToSetoperator +updated_attouch asserted, returnsNoneon missing connection.npm run lint: 0 errors.tsc --noEmitclean.ruff checkclean on all three changed files.aadGroupId.platform_connectionsrow has the id inteams_known_team_ids.docker compose down -v && docker compose up -d.GET /api/connections/{teams_id}/channelsreturns the channels with NO further webhook.Behavior change
PlatformConnection; legacy rows decode cleanly via Pydantic default.require_bridge.seeded N known team id(s) for Teams connection ...when ids are hydrated.🤖 Generated with Claude Code