Skip to content

feat(teams): Graph-based ingestion (no @mention required)#206

Open
alan5543 wants to merge 6 commits into
mainfrom
feat/teams-graph-ingestion
Open

feat(teams): Graph-based ingestion (no @mention required)#206
alan5543 wants to merge 6 commits into
mainfrom
feat/teams-graph-ingestion

Conversation

@alan5543
Copy link
Copy Markdown
Member

Summary

  • Rewires the Teams bridge to be a read-only ingestion adapter on top of Microsoft Graph, matching Slack/Discord/Mattermost — channels and message history are fetched via the platform API without needing the bot to be @mentioned.
  • The root unblocker was a missing await newBot.initialize() in ChatManager.rebuild — the Chat SDK was deferring adapter init until the first inbound webhook, so every bridge-driven Graph read returned empty/403 until someone @mentioned the bot. Everything else stacks on top: Graph channel enumeration, channel-id encoding fixes, channelContext write-back, HTML-entity decode, system/deleted message filter, since/before filtering, backward Graph pagination, MSAL token pre-warm, resilient thread.post, sanitized error logging.
  • Connection must use appType=SingleTenant (MultiTenant triggers MSAL missing_tenant_id_error for client_credentials). The bot's AAD app needs Channel.ReadBasic.All Graph application permission with Global Admin consent (Application Administrator role cannot consent Microsoft Graph app permissions — known Microsoft limit).

Test plan

  • tsc --noEmit clean
  • Bridge unit tests 7/7 pass
  • GET /api/connections/{teams-conn}/channels returns enumerated channels immediately after docker compose restart bot — no @mention required
  • GET /bridge/connections/{conn}/channels/{tech-discussion}/messages returns 4 distinct user messages, clean text (no  )
  • GET /bridge/connections/{conn}/channels/{beever-atlas-test}/messages returns 0 (system/install events filtered)
  • Channel ids passed in percent-encoded form (Teams' :/@) work on legacy, platform-prefixed, and connection-scoped bridge routes
  • since=<future ISO> → 0 messages, since=<past ISO> → all, before=<past> → 0
  • GET …/count agrees with len(messages) for both Teams channels (shared isUserMessage predicate)
  • Per-fetch latency at the Microsoft Graph floor (~1.5s warm, no extra Graph calls in the bridge)
  • Slack/Discord/Mattermost regression: connection counts, channel counts, sample fetches unchanged
  • console.error/warn in the 6 in-scope call sites use safeErrMsg(e) — no raw error objects (which could carry MSAL secrets or Bot-Connector tokens) reach stdout
  • Live @mention reply round-trip — intentionally not tested: Teams is fetch-only by product design; the resilient handler ensures a failed reply doesn't break webhook recording

🤖 Generated with Claude Code

alan5543 and others added 6 commits May 30, 2026 00:33
Teams was @mention-gated and frequently broken. This rewires the Teams
bridge to be a read-only ingestion adapter on top of Microsoft Graph,
matching Slack/Discord/Mattermost where channels and history are
fetched via the platform API without needing the bot to be @mentioned.

Bridge core
- Eager `await newBot.initialize()` in ChatManager.rebuild — root
  unblocker; the SDK was deferring adapter init until the first inbound
  webhook, leaving bridge-driven Graph reads broken.
- TeamsBridge.listChannels enumerates via Graph `/teams/{aadGroupId}/channels`
  with a Redis SCAN cold-start that self-heals (guard only sets on
  success; re-scans when the connection has no known teams).
- Authoritative `{teamId, channelId}` write-back into the adapter's
  `teams:channelContext:*` Redis cache after enumeration — fixes a real
  channel-id-poisoning bug where two channels returned identical messages.
- fetchPage decoupled from `serviceUrl` for channel reads (Graph uses
  team-id/channel-id from getChannelContext, not Bot-Connector serviceUrl).
- getChannel encodes the threadId before fetchChannelInfo (silent
  ValidationError was the cause of the UI's "Channel Not Connected" gate).
- decodeChannelSegment applied at every bridge route capturing a channel-id
  — Teams ids contain `:`/`@`; safe no-op for other platforms.

Message quality
- HTML-entity decode in normalizeMessage (`&nbsp;` etc.) with correct
  order so double-encoded entities stay inert.
- System / deleted / event messages filtered via shared isUserMessage
  predicate (messageType !== "message" OR deletedDateTime).
- getMessageCount uses the same predicate so counts stay consistent
  with getMessages.
- `since` / `before` timestamp filtering honoured.
- Backward Graph pagination — was paging full history then slicing.

Auth + robustness
- Connection must use appType=SingleTenant; MultiTenant breaks MSAL
  client_credentials with `missing_tenant_id_error`.
- Token pre-warm at startup eliminates the ~1.5–2.5s first-request cold
  MSAL acquisition.
- Resilient thread.post in registerHandlers — a failed reply must not
  throw out of the webhook handler (Teams is fetch-only here; outbound
  reply is defensive only).
- safeErrMsg(e) helper used at the 6 in-scope console.error/warn sites
  so raw error objects (which can carry MSAL secrets or short-lived Bot
  Connector tokens) never reach stdout.

Constraint: Microsoft Graph rejects `$select` on /teams/{id}/channels/{id}/messages (HTTP 400 "Query option 'Select' is not allowed") — documented in code so the next engineer doesn't reattempt the optimisation
Constraint: Channel.ReadBasic.All Graph application permission requires Global Administrator consent; Application Administrator role cannot consent Microsoft Graph app permissions
Rejected: $select to trim payload | Graph endpoint does not support it (HTTP 400)
Rejected: cross-connection backfill from shared Redis channelContext keyspace | multi-tenant isolation risk — discovered teamIds stay scoped to the connection that owns them
Directive: Teams here is fetch-only by product design; the bot does not need to reply to @mentions. Do not couple new logic to webhook-driven ingestion — Graph is the source of truth
Confidence: high
Scope-risk: moderate (bridge.ts +345 lines; Slack/Discord/Mattermost paths verified untouched)
Not-tested: live @mention reply round-trip (intentional — Teams is fetch-only; the resilient handler only ensures a failed post doesn't break webhook recording)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Set is mutated via .add() but never reassigned, so eslint's
prefer-const rule rightly flagged the `let` declaration as an error
under the bot's CI lint gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gle-adk)

starlette is pulled transitively via fastapi / google-adk / sse-starlette / mcp.
google-adk (even latest 2.1.0) pins `starlette<1.0.0`, and PYSEC-2026-161 is
fixed in starlette 1.0.1 — so the fix is unreachable until google-adk relaxes
its pin. Mirrors the existing PYSEC-2025-183 (pyjwt) handling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CI blocker was a pre-existing flaky web test, unrelated to the Teams
changes: "two MermaidBlocks each produce at most one fallback" asserted
toHaveLength(2) immediately after a waitFor that only waited for >= 1, so
under CI load the second block's async parse→retry→fallback cycle hadn't
settled (got 1, expected 2). Wait for exactly 2 instead, which also encodes
the no-stacking invariant; the dedicated StrictMode test stays the canonical
guard against a single block stacking a second tile.

Also addresses two security-review findings on this PR's new Graph code:
- Validate teamIds read from the shared Redis channelContext cache as AAD
  group GUIDs before they reach graph.call(...{ "team-id" }), so a poisoned
  cache entry can't inject an arbitrary value into a Graph API path.
- Use safeErrorMessage() (message-only, no stack/raw object) for the
  channels.list failure log, consistent with the M6 sanitization elsewhere —
  String(err).slice(0,200) could surface an MSAL/Graph error payload.

Constraint: GUID validation applied only to the untrusted Redis-scan path; the
  aadGroupId source is an authenticated Bot Framework activity and stays as-is
Rejected: Validate teamIds at the Graph-call site | would also gate the trusted
  Bot Framework path for no benefit
Confidence: high
Scope-risk: narrow
Not-tested: CI re-run of the deflaked test under real load (only local x3)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…og sites

Follow-up to the security review on this PR. Two improvements:

- Collapse the three duplicate error-sanitizers (safeErrMsg in index.ts and
  chat-manager.ts, plus safeErrorMessage in http-utils) down to the single
  http-utils.safeErrorMessage. It is strictly better than the local copies:
  it extracts .message only, collapses whitespace/newlines, and caps at 200
  chars — the local ones raw-sliced to 500 and could pass multi-line output
  (and a stack-bearing String(err)) straight through.

- Apply safeErrorMessage at every remaining console.error/warn that logged a
  raw err/error object across bridge.ts (18 sites), index.ts (6), and
  chat-manager.ts (3). Previously only the 6 "in-scope" Teams sites were
  sanitized; the rest could still surface an MSAL/Graph/Bot-Connector error
  payload (or a stack) into container logs and aggregators.

Structured fields are preserved where they carried signal: the connection-route
warn still prefers `(err as any)?.data?.error`, now wrapped in safeErrorMessage
so the fallback can't leak. The webhook %s format strings stay static literals
(CodeQL js/tainted-format-string) with the sanitized value passed as an arg.

Constraint: keep the static %s format strings (CodeQL tainted-format-string fix)
Rejected: per-file local helper | drifts — the 500 vs 200 split already happened
Confidence: high
Scope-risk: narrow
Not-tested: live MSAL error payload shape (logging path only, no behavior change)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le token

Three Teams-only latency wins (no other platform's code path is touched):

1. Cache the slow Graph channel enumeration. TeamsBridge.listChannels split so
   the live webhook registry is still merged FRESH every call, while the
   expensive part — one ~1–1.5s Graph round-trip per installed team, plus the
   Redis cold-start scan — is wrapped in a 60s TTL cache with in-flight dedup
   and stale-on-error fallback. Mirrors the existing DiscordBridge.channelCache
   pattern. Cached only on a non-empty success, so a team discovered later (via
   webhook) is never masked by a cached empty result.

2. Parallelize per-team enumeration. The per-team `GET /teams/{id}/channels`
   calls now run concurrently (Promise.all) instead of sequentially; each
   team's failure is isolated to [] so one bad team can't fail the set.
   teamIds is bounded by install count (a handful), well under Graph throttle.

3. Re-warmable MSAL token. Extracted the one-shot startup pre-warm into an
   exported warmTeamsGraphToken(adapter) and re-fire it on every adapter
   rebuild via onRebuildComplete. A rebuild (connection change or the 6h
   recycle) creates fresh adapters with an empty token cache — exactly when the
   next fetch would otherwise pay the ~1.5–2.5s cold-acquire. The token is
   shared across ALL Graph reads, so this also speeds the first message fetch.

Verified: tsc clean, eslint 0 errors, 187/187 bot tests (4 new for
warmTeamsGraphToken covering the happy path, no-graph no-op, async-reject
swallow, and sync-throw guard). Diff is confined to TeamsBridge + the Teams
token-warm wiring; Slack/Discord/Mattermost/Telegram code is unchanged.

Constraint: registry must stay live — only the Graph enumeration is cached
Constraint: warmTeamsGraphToken must never throw (guards both async + sync)
Rejected: cache the whole merged listChannels result | would delay webhook-
  discovered DMs/channels by up to the TTL
Rejected: bound Promise.all concurrency | install count is tiny; adds no value
Confidence: high
Scope-risk: narrow
Not-tested: live Graph throttling under many installed teams; real MSAL warm timing

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant