feat(teams): Graph-based ingestion (no @mention required)#206
Merged
Conversation
Teams was @mention-gated and frequently broken. This rewires the Teams
bridge to be a read-only ingestion adapter on top of Microsoft Graph,
matching Slack/Discord/Mattermost where channels and history are
fetched via the platform API without needing the bot to be @mentioned.
Bridge core
- Eager `await newBot.initialize()` in ChatManager.rebuild — root
unblocker; the SDK was deferring adapter init until the first inbound
webhook, leaving bridge-driven Graph reads broken.
- TeamsBridge.listChannels enumerates via Graph `/teams/{aadGroupId}/channels`
with a Redis SCAN cold-start that self-heals (guard only sets on
success; re-scans when the connection has no known teams).
- Authoritative `{teamId, channelId}` write-back into the adapter's
`teams:channelContext:*` Redis cache after enumeration — fixes a real
channel-id-poisoning bug where two channels returned identical messages.
- fetchPage decoupled from `serviceUrl` for channel reads (Graph uses
team-id/channel-id from getChannelContext, not Bot-Connector serviceUrl).
- getChannel encodes the threadId before fetchChannelInfo (silent
ValidationError was the cause of the UI's "Channel Not Connected" gate).
- decodeChannelSegment applied at every bridge route capturing a channel-id
— Teams ids contain `:`/`@`; safe no-op for other platforms.
Message quality
- HTML-entity decode in normalizeMessage (` ` etc.) with correct
order so double-encoded entities stay inert.
- System / deleted / event messages filtered via shared isUserMessage
predicate (messageType !== "message" OR deletedDateTime).
- getMessageCount uses the same predicate so counts stay consistent
with getMessages.
- `since` / `before` timestamp filtering honoured.
- Backward Graph pagination — was paging full history then slicing.
Auth + robustness
- Connection must use appType=SingleTenant; MultiTenant breaks MSAL
client_credentials with `missing_tenant_id_error`.
- Token pre-warm at startup eliminates the ~1.5–2.5s first-request cold
MSAL acquisition.
- Resilient thread.post in registerHandlers — a failed reply must not
throw out of the webhook handler (Teams is fetch-only here; outbound
reply is defensive only).
- safeErrMsg(e) helper used at the 6 in-scope console.error/warn sites
so raw error objects (which can carry MSAL secrets or short-lived Bot
Connector tokens) never reach stdout.
Constraint: Microsoft Graph rejects `$select` on /teams/{id}/channels/{id}/messages (HTTP 400 "Query option 'Select' is not allowed") — documented in code so the next engineer doesn't reattempt the optimisation
Constraint: Channel.ReadBasic.All Graph application permission requires Global Administrator consent; Application Administrator role cannot consent Microsoft Graph app permissions
Rejected: $select to trim payload | Graph endpoint does not support it (HTTP 400)
Rejected: cross-connection backfill from shared Redis channelContext keyspace | multi-tenant isolation risk — discovered teamIds stay scoped to the connection that owns them
Directive: Teams here is fetch-only by product design; the bot does not need to reply to @mentions. Do not couple new logic to webhook-driven ingestion — Graph is the source of truth
Confidence: high
Scope-risk: moderate (bridge.ts +345 lines; Slack/Discord/Mattermost paths verified untouched)
Not-tested: live @mention reply round-trip (intentional — Teams is fetch-only; the resilient handler only ensures a failed post doesn't break webhook recording)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Set is mutated via .add() but never reassigned, so eslint's prefer-const rule rightly flagged the `let` declaration as an error under the bot's CI lint gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gle-adk) starlette is pulled transitively via fastapi / google-adk / sse-starlette / mcp. google-adk (even latest 2.1.0) pins `starlette<1.0.0`, and PYSEC-2026-161 is fixed in starlette 1.0.1 — so the fix is unreachable until google-adk relaxes its pin. Mirrors the existing PYSEC-2025-183 (pyjwt) handling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CI blocker was a pre-existing flaky web test, unrelated to the Teams
changes: "two MermaidBlocks each produce at most one fallback" asserted
toHaveLength(2) immediately after a waitFor that only waited for >= 1, so
under CI load the second block's async parse→retry→fallback cycle hadn't
settled (got 1, expected 2). Wait for exactly 2 instead, which also encodes
the no-stacking invariant; the dedicated StrictMode test stays the canonical
guard against a single block stacking a second tile.
Also addresses two security-review findings on this PR's new Graph code:
- Validate teamIds read from the shared Redis channelContext cache as AAD
group GUIDs before they reach graph.call(...{ "team-id" }), so a poisoned
cache entry can't inject an arbitrary value into a Graph API path.
- Use safeErrorMessage() (message-only, no stack/raw object) for the
channels.list failure log, consistent with the M6 sanitization elsewhere —
String(err).slice(0,200) could surface an MSAL/Graph error payload.
Constraint: GUID validation applied only to the untrusted Redis-scan path; the
aadGroupId source is an authenticated Bot Framework activity and stays as-is
Rejected: Validate teamIds at the Graph-call site | would also gate the trusted
Bot Framework path for no benefit
Confidence: high
Scope-risk: narrow
Not-tested: CI re-run of the deflaked test under real load (only local x3)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…og sites Follow-up to the security review on this PR. Two improvements: - Collapse the three duplicate error-sanitizers (safeErrMsg in index.ts and chat-manager.ts, plus safeErrorMessage in http-utils) down to the single http-utils.safeErrorMessage. It is strictly better than the local copies: it extracts .message only, collapses whitespace/newlines, and caps at 200 chars — the local ones raw-sliced to 500 and could pass multi-line output (and a stack-bearing String(err)) straight through. - Apply safeErrorMessage at every remaining console.error/warn that logged a raw err/error object across bridge.ts (18 sites), index.ts (6), and chat-manager.ts (3). Previously only the 6 "in-scope" Teams sites were sanitized; the rest could still surface an MSAL/Graph/Bot-Connector error payload (or a stack) into container logs and aggregators. Structured fields are preserved where they carried signal: the connection-route warn still prefers `(err as any)?.data?.error`, now wrapped in safeErrorMessage so the fallback can't leak. The webhook %s format strings stay static literals (CodeQL js/tainted-format-string) with the sanitized value passed as an arg. Constraint: keep the static %s format strings (CodeQL tainted-format-string fix) Rejected: per-file local helper | drifts — the 500 vs 200 split already happened Confidence: high Scope-risk: narrow Not-tested: live MSAL error payload shape (logging path only, no behavior change) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le token
Three Teams-only latency wins (no other platform's code path is touched):
1. Cache the slow Graph channel enumeration. TeamsBridge.listChannels split so
the live webhook registry is still merged FRESH every call, while the
expensive part — one ~1–1.5s Graph round-trip per installed team, plus the
Redis cold-start scan — is wrapped in a 60s TTL cache with in-flight dedup
and stale-on-error fallback. Mirrors the existing DiscordBridge.channelCache
pattern. Cached only on a non-empty success, so a team discovered later (via
webhook) is never masked by a cached empty result.
2. Parallelize per-team enumeration. The per-team `GET /teams/{id}/channels`
calls now run concurrently (Promise.all) instead of sequentially; each
team's failure is isolated to [] so one bad team can't fail the set.
teamIds is bounded by install count (a handful), well under Graph throttle.
3. Re-warmable MSAL token. Extracted the one-shot startup pre-warm into an
exported warmTeamsGraphToken(adapter) and re-fire it on every adapter
rebuild via onRebuildComplete. A rebuild (connection change or the 6h
recycle) creates fresh adapters with an empty token cache — exactly when the
next fetch would otherwise pay the ~1.5–2.5s cold-acquire. The token is
shared across ALL Graph reads, so this also speeds the first message fetch.
Verified: tsc clean, eslint 0 errors, 187/187 bot tests (4 new for
warmTeamsGraphToken covering the happy path, no-graph no-op, async-reject
swallow, and sync-throw guard). Diff is confined to TeamsBridge + the Teams
token-warm wiring; Slack/Discord/Mattermost/Telegram code is unchanged.
Constraint: registry must stay live — only the Graph enumeration is cached
Constraint: warmTeamsGraphToken must never throw (guards both async + sync)
Rejected: cache the whole merged listChannels result | would delay webhook-
discovered DMs/channels by up to the TTL
Rejected: bound Promise.all concurrency | install count is tiny; adds no value
Confidence: high
Scope-risk: narrow
Not-tested: live Graph throttling under many installed teams; real MSAL warm timing
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced May 31, 2026
alan5543
added a commit
that referenced
this pull request
Jun 1, 2026
…UID checks The wizard's Validate step previously only constructed the Teams adapter (format check) — a typo'd App ID or wrong secret passed "validation" and only failed later when channel enumeration silently returned []. Now: - bridge.ts handleValidateAdapter actually mints a Graph token via MSAL and classifies AADSTS / unauthorized_client / invalid_client errors as credential failures; non-auth probe failures (403 consent, network) soft-accept with a pointer to Channel.ReadBasic.All admin consent. - ConnectionWizard validates App ID / Tenant ID against the AAD GUID shape client-side, renders inline errors, and gates the Validate button until they pass. - Teams is no longer treated as webhook-only: the Channels step renders the real Graph-enumerated channel list (depends on #206). Verified live: a real Teams connection was created through this exact flow today (channel discovery + message history sync working). Constraint: validation must not require the messaging endpoint to be live yet Rejected: server-side-only GUID validation | users deserve inline feedback before a round-trip Confidence: high Scope-risk: narrow Not-tested: MultiTenant validation path against a real multi-org token Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
alan5543
added a commit
that referenced
this pull request
Jun 1, 2026
The Connect Microsoft Teams wizard's instructions, credential form,
and post-validation panel previously omitted everything that turns a
real-world Teams setup into a working Beever Atlas connection:
• The App Type field placeholder was "MultiTenant", which produces
an MSAL `missing_tenant_id_error` against any tenant-scoped Azure
Bot. SingleTenant is the supported path, but nothing in the UI
said so.
• No mention of the Microsoft Graph `Channel.ReadBasic.All`
application permission required for the Graph-based channel
enumeration introduced in #206 — and no warning that ONLY a
Global Administrator can consent it (Application Administrator
and Cloud Application Administrator both return
`Authorization_RequestDenied`).
• No mention of the messaging endpoint format — users were left
guessing whether it's `/api/messages`, `/api/teams`, or
`/api/webhooks/teams`. The bot listens on `/api/teams` (see
`bot/src/index.ts:422`).
• No mention of ngrok for local dev, despite the bot needing a
publicly reachable HTTPS endpoint to receive Bot Framework
activities.
• No mention of installing the Teams app package
(`bot/teams-app/beever-atlas-teams.zip`) — without that step
the bot never appears in the team.
This change:
1. Rewrites `TEAMS_INSTRUCTIONS` (8 numbered steps with sub-bullets
for non-obvious details, ngrok callout, admin-consent gotcha,
and the .zip install path).
2. Extends `CredentialField` with optional `enum`, `default`, and
`hint` so `app_type` renders as a `<select>` (SingleTenant /
MultiTenant) defaulting to SingleTenant and the
`app_tenant_id` field carries a "why required" hint. No
regression for the other four platforms — none of them use the
new fields.
3. Replaces the generic Teams branch of `StepWebhookMode` with a
dedicated `TeamsWebhookMode` panel covering the three concrete
post-validation steps (endpoint URL, Graph permission, Teams
app package).
4. Fixes `bot/README.md` env table to include
`TEAMS_APP_TENANT_ID` and explain SingleTenant vs MultiTenant.
5. Fixes `docs/content/getting-started/teams-setup.mdx` env var
names (the doc previously referenced `TEAMS_TENANT_ID` /
`TEAMS_CLIENT_ID` / `TEAMS_CLIENT_SECRET`, which are not read
by the bot) and replaces the historical Graph permissions
table with the actually-required `Channel.ReadBasic.All`
(RSC permissions live in the manifest now).
Verified:
- `npx tsc --noEmit` clean
- `npx eslint` on `ConnectionWizard.tsx` clean
- Verifier agent confirmed: credential field key round-trip
(snake → camel) intact, no regression for slack/discord/
telegram/mattermost CREDENTIAL_FIELDS entries, `<select>` state
persists across step navigation.
Stacked on top of #206 because the wizard text describes the Graph
channel enumeration introduced there (`TeamsBridge.listChannels`'s
call to `teams.channels.list`). Retarget to `main` once #206 lands.
Confidence: high
Scope-risk: narrow — UI text + one optional schema extension; no
backend or bot code paths touched.
Directive: keep the `app_type` enum values in lockstep with the
Chat SDK adapter's `createTeamsAdapter` accepted values. The
adapter treats anything other than literal "MultiTenant" as
SingleTenant — so a typo here doesn't fail loudly, it fails
silently as a missing-tenant-id MSAL error.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alan5543
added a commit
that referenced
this pull request
Jun 1, 2026
…UID checks The wizard's Validate step previously only constructed the Teams adapter (format check) — a typo'd App ID or wrong secret passed "validation" and only failed later when channel enumeration silently returned []. Now: - bridge.ts handleValidateAdapter actually mints a Graph token via MSAL and classifies AADSTS / unauthorized_client / invalid_client errors as credential failures; non-auth probe failures (403 consent, network) soft-accept with a pointer to Channel.ReadBasic.All admin consent. - ConnectionWizard validates App ID / Tenant ID against the AAD GUID shape client-side, renders inline errors, and gates the Validate button until they pass. - Teams is no longer treated as webhook-only: the Channels step renders the real Graph-enumerated channel list (depends on #206). Verified live: a real Teams connection was created through this exact flow today (channel discovery + message history sync working). Constraint: validation must not require the messaging endpoint to be live yet Rejected: server-side-only GUID validation | users deserve inline feedback before a round-trip Confidence: high Scope-risk: narrow Not-tested: MultiTenant validation path against a real multi-org token Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 1, 2026
alan5543
added a commit
that referenced
this pull request
Jun 1, 2026
* docs(teams): accurate setup wizard + env + Graph permissions
The Connect Microsoft Teams wizard's instructions, credential form,
and post-validation panel previously omitted everything that turns a
real-world Teams setup into a working Beever Atlas connection:
• The App Type field placeholder was "MultiTenant", which produces
an MSAL `missing_tenant_id_error` against any tenant-scoped Azure
Bot. SingleTenant is the supported path, but nothing in the UI
said so.
• No mention of the Microsoft Graph `Channel.ReadBasic.All`
application permission required for the Graph-based channel
enumeration introduced in #206 — and no warning that ONLY a
Global Administrator can consent it (Application Administrator
and Cloud Application Administrator both return
`Authorization_RequestDenied`).
• No mention of the messaging endpoint format — users were left
guessing whether it's `/api/messages`, `/api/teams`, or
`/api/webhooks/teams`. The bot listens on `/api/teams` (see
`bot/src/index.ts:422`).
• No mention of ngrok for local dev, despite the bot needing a
publicly reachable HTTPS endpoint to receive Bot Framework
activities.
• No mention of installing the Teams app package
(`bot/teams-app/beever-atlas-teams.zip`) — without that step
the bot never appears in the team.
This change:
1. Rewrites `TEAMS_INSTRUCTIONS` (8 numbered steps with sub-bullets
for non-obvious details, ngrok callout, admin-consent gotcha,
and the .zip install path).
2. Extends `CredentialField` with optional `enum`, `default`, and
`hint` so `app_type` renders as a `<select>` (SingleTenant /
MultiTenant) defaulting to SingleTenant and the
`app_tenant_id` field carries a "why required" hint. No
regression for the other four platforms — none of them use the
new fields.
3. Replaces the generic Teams branch of `StepWebhookMode` with a
dedicated `TeamsWebhookMode` panel covering the three concrete
post-validation steps (endpoint URL, Graph permission, Teams
app package).
4. Fixes `bot/README.md` env table to include
`TEAMS_APP_TENANT_ID` and explain SingleTenant vs MultiTenant.
5. Fixes `docs/content/getting-started/teams-setup.mdx` env var
names (the doc previously referenced `TEAMS_TENANT_ID` /
`TEAMS_CLIENT_ID` / `TEAMS_CLIENT_SECRET`, which are not read
by the bot) and replaces the historical Graph permissions
table with the actually-required `Channel.ReadBasic.All`
(RSC permissions live in the manifest now).
Verified:
- `npx tsc --noEmit` clean
- `npx eslint` on `ConnectionWizard.tsx` clean
- Verifier agent confirmed: credential field key round-trip
(snake → camel) intact, no regression for slack/discord/
telegram/mattermost CREDENTIAL_FIELDS entries, `<select>` state
persists across step navigation.
Stacked on top of #206 because the wizard text describes the Graph
channel enumeration introduced there (`TeamsBridge.listChannels`'s
call to `teams.channels.list`). Retarget to `main` once #206 lands.
Confidence: high
Scope-risk: narrow — UI text + one optional schema extension; no
backend or bot code paths touched.
Directive: keep the `app_type` enum values in lockstep with the
Chat SDK adapter's `createTeamsAdapter` accepted values. The
adapter treats anything other than literal "MultiTenant" as
SingleTenant — so a typo here doesn't fail loudly, it fails
silently as a missing-tenant-id MSAL error.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(teams-wizard): trim setup steps and webhook panel for scannability
First pass turned every step into a paragraph with parenthetical
rationale and warning callouts; the result rendered as a vertical
wall of prose, monospace sub-bullets full of long sentences, and a
3-card post-validation panel that re-stated info from the setup
list. User feedback: "UX is bad."
This trim:
• Reduces setup from 8 long steps to 6 single-line imperatives.
Each step says WHAT to click; rationale ("required so MSAL
client_credentials can mint the Graph token") is removed —
users don't need that to follow the click path.
• Reserves the `details` slot (monospace) for things that should
actually be monospaced: an enum value, a permission name.
Prose details made the layout feel like a code listing.
• Moves the messaging endpoint, ngrok instructions, and Teams
app `.zip` install OUT of setup and INTO the post-validation
`TeamsWebhookMode` panel — they happen AFTER credentials
validate, so putting them in the upfront list both bloated
setup and skipped the natural workflow break.
• Drops the redundant Channel.ReadBasic.All card from
TeamsWebhookMode — it's already in setup step 6, and
repeating it implied "do this again" rather than "review."
• TeamsWebhookMode is now 2 cards instead of 3, with shorter
body copy.
Verified:
- `npx tsc --noEmit` on `web/` clean
- Web image rebuilt and deployed; localhost:3000 returns HTTP 200
- The instruction list now scrolls minimally above the Display
Name input
Confidence: high
Scope-risk: narrow — UI text only.
Directive: the `details` array on a setup-step instruction renders
in monospace. Use it for code-shaped values (paths, enum values,
command snippets), NEVER for prose explanations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(teams-wizard): add messaging-endpoint step + clarify both app types supported
User-reported gaps after the first trim:
• The setup steps explained creds/Azure but never told users
HOW to get a public webhook URL (ngrok) or WHERE to enter the
bot URL — they were left guessing whether we needed it as a
wizard field. We don't (bot listens on a fixed path), but the
setup list must say so.
• The App Type hint read as "we only support SingleTenant" when
in fact the select offers both and the SDK accepts either.
Changes:
• Setup step 2 (new): "Expose this bridge over HTTPS, then set
the Bot's Messaging endpoint to your URL + /api/teams" with
mono details for `ngrok http 3001` and the URL pattern. This
surfaces what was previously buried in the post-validation
panel.
• Setup step 1 detail updated to "SingleTenant (recommended) or
MultiTenant" so the choice is visible upfront.
• App Type hint rewritten to lead with "Both modes are
supported" — no longer reads as a restriction.
• TeamsWebhookMode collapses from 2 cards to 1: just the .zip
app install. The endpoint card moved to the setup list above;
keeping it here would have been redundant.
Verified:
- `npx tsc --noEmit` on `web/` clean
- Web image rebuilt and deployed
Confidence: high
Scope-risk: narrow — wizard text only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(teams-wizard): mint real Graph token on Validate + client-side GUID checks
The wizard's Validate step previously only constructed the Teams adapter
(format check) — a typo'd App ID or wrong secret passed "validation" and
only failed later when channel enumeration silently returned []. Now:
- bridge.ts handleValidateAdapter actually mints a Graph token via MSAL
and classifies AADSTS / unauthorized_client / invalid_client errors as
credential failures; non-auth probe failures (403 consent, network)
soft-accept with a pointer to Channel.ReadBasic.All admin consent.
- ConnectionWizard validates App ID / Tenant ID against the AAD GUID
shape client-side, renders inline errors, and gates the Validate
button until they pass.
- Teams is no longer treated as webhook-only: the Channels step renders
the real Graph-enumerated channel list (depends on #206).
Verified live: a real Teams connection was created through this exact
flow today (channel discovery + message history sync working).
Constraint: validation must not require the messaging endpoint to be live yet
Rejected: server-side-only GUID validation | users deserve inline feedback before a round-trip
Confidence: high
Scope-risk: narrow
Not-tested: MultiTenant validation path against a real multi-org token
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(teams-app): point manifest at the real bot app id + align package filename
The loose manifest.json referenced botId eefc03cb-… which does not exist
in any Azure tenant (stale id from an early registration attempt), and
was missing webApplicationInfo + RSC permissions — a package built from
it could install but never read channel history. The actual working
package (built via teams CLI) used the correct id but generic
"Developer/example.com" branding.
Merge the two: correct id (fb24e83f-…), manifest schema 1.25, RSC perms
(ChannelMessage.Read.Group, ChatMessage.Read.Chat), personal tabs, and
Beever AI branding. Bump to 1.0.3 (dev portal auto-bumped to the same).
build-package.mjs now writes beever-atlas-teams.zip (the name actually
used/uploaded everywhere) instead of beever-atlas-bot.zip; .gitignore
updated to match so the build artifact stays untracked.
Constraint: dev-portal catalog already at version 1.0.2 — local manifest must be ≥1.0.3
Rejected: tracking the built zip in git | reproducible artifact, build script exists for that
Confidence: high
Scope-risk: narrow
Directive: botId must equal the AAD app id fb24e83f-… — never regenerate it independently
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alan5543
added a commit
that referenced
this pull request
Jun 1, 2026
Two CI fixes surfaced by the first full-suite run on this branch (it only got smoke tests while stacked on #206): - ruff format: connections.py had a 3-line wrap ruff collapses to 1 - CodeQL js/tainted-format-string (HIGH): the persist-failure warn passed an interpolated template string as console.warn's first arg alongside a second arg — Node treats arg[0] as a printf format string when more args follow, so a connectionId containing %s/%j would hijack substitution. Switched to the constant-format-string + %s args pattern already used elsewhere in this file (e.g. connection route error logging). Confidence: high Scope-risk: narrow Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
alan5543
added a commit
that referenced
this pull request
Jun 1, 2026
…rms (#211) * feat(teams): persist aadGroupId to Mongo for parity with other platforms Other platforms (Slack/Discord/Mattermost) bootstrap channel listings from a bot token stored in Mongo — identity survives bot restart and Redis loss without an inbound webhook. Teams was the outlier: there is no app-only Microsoft Graph endpoint to enumerate "teams this app is installed in", so the team's AAD group id was observed from Bot Framework activities and cached ONLY in the chat-adapter's Redis (`chat-sdk:cache:teams:channelContext:*`). A Redis container restart or 30-day cache TTL erased that identity and the Teams workspace vanished from the sidebar until the next inbound webhook reseeded it. This change makes Teams self-bootstrap exactly like the others: 1. Pydantic model: new `teams_known_team_ids: list[str]` on `PlatformConnection` (default `[]`). No DB migration — legacy docs decode through the default. 2. Store: `add_teams_known_team_id` upserts via `$addToSet` so concurrent writes from multiple webhook deliveries can't double- insert without holding a lock. 3. Internal API: the existing `GET /api/internal/connections/credentials` now returns `teams_known_team_ids` so the bot's startup loader sees them. A new `POST /api/internal/connections/{id}/teams-known-team-ids` accepts a validated AAD group GUID and persists it. Both behind the existing `require_bridge` gate. 4. Bot bridge: `recordTeamsConversation` now fires a fire-and-forget POST whenever a NEW aadGroupId arrives for a connection (dedup in-memory before queueing). Backend dedups again via `$addToSet`. 5. Bot startup: `syncConnectionsFromBackend` calls a new exported `seedTeamsKnownTeamIds(connId, ids)` after registering each Teams adapter, hydrating the in-memory `teamsKnownTeamIds` Map straight from Mongo. `seedTeamsKnownTeamIds` also flips `teamsColdStartScanned` for that connection so the Redis scan path can't race the hydrated state. Result: a `docker compose down -v && up -d` (or any Redis/bot restart) returns the Teams workspace + channels to the sidebar on first listChannels — no webhook required. Existing connections benefit on their FIRST inbound activity after deploy: the write-through fires, Mongo gets the team-id, every subsequent restart hydrates cleanly. Verified: - 9/9 new bot unit tests (bridge.teams-persistence.test.ts): seed hydrates / no-op on empty / per-connection scoped / idempotent; write-through POSTs once / dedups same id / re-fires different id / rejects malformed GUID / no-op when aadGroupId absent. - 88/88 bot bridge tests pass (was 79). - 14/14 platform_store tests pass: default `[]`, round-trip preserves ids, legacy docs decode, `$addToSet` operator + `updated_at` touch asserted, returns None on missing connection. - bot lint: 0 errors. tsc --noEmit clean. - Python ruff clean on all three changed files. Constraint: Bot Framework hands the team identity via webhooks only; no app-only Graph endpoint for "list installed teams" exists. Constraint: The fire-and-forget POST must not block webhook processing — 5s timeout + caught errors + dedup. Rejected: Reading the chat-adapter Redis cache as primary source | cache TTL (30d) and restart-fragility was the original bug. Rejected: Migrating PlatformConnection schema | Pydantic default `[]` covers legacy rows, no down-revision needed. Confidence: high Scope-risk: narrow — additive (model field, one store method, one new endpoint, one bridge export, one startup seed call). Directive: `seedTeamsKnownTeamIds` MUST also flip `teamsColdStartScanned[connectionId]` so a stale Redis cache can't race the hydrated Map state. Do not separate these without re- testing the restart drill. Not-tested: Live `docker compose down -v && up -d` drill — requires rebuilt bot + backend images on a live stack; deferred to merge validation by the operator. The unit test surface covers the helpers; the smoke test plan is in the PR body. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(teams): cold-start scan also writes through to Mongo The cold-start Redis scan in `resolveTeamIds` pre-populates the in-memory `teamsKnownTeamIds` Map when an existing connection has channelContext entries cached from a prior run. After that, every subsequent webhook short-circuits the write-through in `recordTeamsConversation` (because its dedup checks the in-memory state), so an existing connection upgraded to this PR would never seed `teams_known_team_ids` in Mongo and bot-restart-survives-Redis- wipe never engages. Fix: when the cold-start scan adds a NEW id to the Map, also fire the fire-and-forget write-through. Backend `$addToSet` keeps it idempotent. The new-connection path is unchanged — first webhook still seeds Mongo via the existing `recordTeamsConversation` branch. Verified: - 88/88 bridge tests still pass (no behavior change for new connections). - tsc --noEmit clean. Confidence: high Scope-risk: narrow — single block inside the existing scan loop. Directive: keep the in-memory `teamSet.add(tId)` AFTER the `wasNew` check; reversing the order silently never persists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): ruff format connections.py + constant format string for CodeQL Two CI fixes surfaced by the first full-suite run on this branch (it only got smoke tests while stacked on #206): - ruff format: connections.py had a 3-line wrap ruff collapses to 1 - CodeQL js/tainted-format-string (HIGH): the persist-failure warn passed an interpolated template string as console.warn's first arg alongside a second arg — Node treats arg[0] as a printf format string when more args follow, so a connectionId containing %s/%j would hijack substitution. Switched to the constant-format-string + %s args pattern already used elsewhere in this file (e.g. connection route error logging). Confidence: high Scope-risk: narrow Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
await newBot.initialize()inChatManager.rebuild— the Chat SDK was deferring adapter init until the first inbound webhook, so every bridge-driven Graph read returned empty/403 until someone @mentioned the bot. Everything else stacks on top: Graph channel enumeration, channel-id encoding fixes, channelContext write-back, HTML-entity decode, system/deleted message filter,since/beforefiltering, backward Graph pagination, MSAL token pre-warm, resilientthread.post, sanitized error logging.appType=SingleTenant(MultiTenant triggers MSALmissing_tenant_id_errorfor client_credentials). The bot's AAD app needsChannel.ReadBasic.AllGraph application permission with Global Admin consent (Application Administrator role cannot consent Microsoft Graph app permissions — known Microsoft limit).Test plan
tsc --noEmitcleanGET /api/connections/{teams-conn}/channelsreturns enumerated channels immediately afterdocker compose restart bot— no @mention requiredGET /bridge/connections/{conn}/channels/{tech-discussion}/messagesreturns 4 distinct user messages, clean text (no )GET /bridge/connections/{conn}/channels/{beever-atlas-test}/messagesreturns 0 (system/install events filtered):/@) work on legacy, platform-prefixed, and connection-scoped bridge routessince=<future ISO>→ 0 messages,since=<past ISO>→ all,before=<past>→ 0GET …/countagrees withlen(messages)for both Teams channels (sharedisUserMessagepredicate)console.error/warnin the 6 in-scope call sites usesafeErrMsg(e)— no raw error objects (which could carry MSAL secrets or Bot-Connector tokens) reach stdout@mentionreply round-trip — intentionally not tested: Teams is fetch-only by product design; the resilient handler ensures a failed reply doesn't break webhook recording🤖 Generated with Claude Code