Skip to content

feat(channels): Bitrix24 channel core + UI + per-user MCP (split 3/3 from #1057)#1061

Open
tech-synity wants to merge 54 commits into
nextlevelbuilder:devfrom
tech-synity:feat/bitrix24-channel-core
Open

feat(channels): Bitrix24 channel core + UI + per-user MCP (split 3/3 from #1057)#1061
tech-synity wants to merge 54 commits into
nextlevelbuilder:devfrom
tech-synity:feat/bitrix24-channel-core

Conversation

@tech-synity
Copy link
Copy Markdown

Summary

PR 3/3 split from #1057 — final part. Bitrix24 channel implementation, MCP integration (Path B per-user OAuth), UI fields, agent layer support for per-user credentials in group chats.

Stacked on PR 2 (#1060) which is stacked on PR 1 (#1059). Diff vs `dev` will show all 3 PRs' content until 1 + 2 merge.

Commits (27)

Channel core (Phase 03 channel-side) + 26 follow-ups grouped by theme:

Setup / install flow

  • `feat(bitrix24): support Local Application install flow`
  • `feat(bitrix24): bootstrap app_token from first event`
  • `fix(bitrix24): call BX24.installFinish() to unblock event delivery`
  • `feat(bitrix24): log install + event entry for observability`

MCP integration (Path B — Bitrix access_token as MCP auth)

  • `refactor(bitrix24): phase A — remove parallel MCP mapping layer`
  • `refactor(bitrix24): phase B — wire channel into partner's ContactCollector`
  • `feat(bitrix24): phase C — lazy MCP provisioner with Open Channel skip`
  • `refactor(bitrix24): per-server MCP admin token + user degradation notice`
  • `refactor(bitrix24): drop admin token, use Bitrix access_token as MCP auth (Path B)`
  • `docs(bitrix24): add Rev5 MCP integration plan — Path B shipped`

Group chat semantics

  • `fix(bitrix24): detect group @mention via MENTIONED_LIST + MESSAGE_ORIGINAL`
  • `fix(bitrix24): classify CHAT_TYPE=X (Tasks/entity chat) as group`
  • `feat(bitrix24): forward CHAT_ENTITY_TYPE + CHAT_ENTITY_ID to bus metadata`
  • `feat(gateway): inject Bitrix24 entity binding hint into agent system prompt`
  • `feat(bitrix24): @mention asker in group reply for multi-user clarity`
  • `fix(agent): use SenderID for per-user MCP credential lookup in group chats`
  • `fix(bitrix24): scope MCP server lookup by channel tenant_id`

Operability

  • `feat(bitrix24): BITRIX24_FORCE_REREGISTER env to refresh public_url`
  • `feat(bitrix24): convert Markdown to BBCode before sending chat chunks`
  • `feat(bitrix24): configurable bot_type (B internal / O open channel)`
  • `feat(bitrix24): resolve contact names via user.get`

UI

  • `feat(ui/channels): add Bitrix24 to channel picker + schemas`
  • `feat(ui): expose bot_type + MCP config fields for bitrix24 channel`

MEDIUM hardening (responding to #1057 review)

  • `fix(bitrix24): sanitize webhook entity metadata before system-prompt interpolation` — caps length + rejects control chars on `bitrix_chat_entity_type/id` before system-prompt injection. 14 unit tests.
  • `fix(bitrix24): explicit migration-missing warning in BootstrapPortals` — distinct warn naming migration 000058 + remediation command, so operator finds it without grepping.

Architecture highlights

Per-user MCP credentials in group chats — `resolveActorUserID` (in `internal/agent/loop_mcp_user.go`, already channel-agnostic — note for #1057 review: the helper does not live in `bitrix24/`) routes credential lookups by `SenderID` instead of the group-composite `UserID`. This also benefits future Telegram/Slack per-user OAuth integrations without further changes.

Tenant isolation — MCP server lookups scoped by `store.WithTenantID(ctx, c.TenantID())`, matching the Discord/Zalo/Feishu pattern.

Webhook signature verification — `application_token` verified before any side effect; constant-time compare.

Idempotent register — persisted bot id; `BITRIX24_FORCE_REREGISTER=1` env bypasses cache when `public_url` rotates.

Test plan

  • `go build .` (PG) passes
  • `go build -tags sqliteonly ./...` passes (assuming PR 2 merged or stacked)
  • `go vet ./internal/channels/bitrix24/...` clean
  • `go test ./internal/channels/bitrix24/...` passes
  • `go test ./cmd -run TestIsSafeBitrixEntityToken` — 14 cases pass
  • Live verification on `tamgiac.bitrix24.com` over multiple weeks of dogfooding: DM, group chat, CRM Deal-bound chat, Tasks-bound chat, multi-user group with per-user MCP, public URL rotation
  • PG integration test on maintainer CI
  • Maintainer to approve CI workflow run (first-time contributor PR)

Documentation

  • Setup runbook: `plans/bitrix24-mcp-refactor/reports/setup-runbook.md`
  • Retrospective: `plans/bitrix24-mcp-refactor/reports/retrospective.md`
  • Rev5 MCP plan: `plan/goclaw-mcp-integration.md`

Coordination

tech-synity and others added 30 commits April 28, 2026 17:58
The bare pattern `pkg-helper` on line 6 matched any file or directory
named pkg-helper anywhere in the tree — including the tracked source
directory `cmd/pkg-helper/`. Git tolerated this because already-tracked
files override ignore rules, but uploaders that apply .gitignore patterns
naively to the filesystem (notably `railway up`) stripped the source
directory from the build context, breaking `go build ./cmd/pkg-helper`
in Docker with `stat /src/cmd/pkg-helper: directory not found`.

Anchor both binary-artifact patterns (`/openclaw-go`, `/pkg-helper`) to
the repo root so they only match the compiled output, not the source.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Working tree kept surfacing `goclaw.exe` / `goclaw-local.exe` (Windows
dev builds) and ad-hoc `tmp-reset-bot/` probe scripts as untracked.
A single `git add .` would commit them — both are useless outside the
author's machine and the debug probes import `internal/` packages from
outside `cmd/` (ugly but legal), so they should never ride into Git.

Add to .gitignore (anchored like the existing `/openclaw-go` entry, so
matching doesn't accidentally touch source dirs — same concern that
broke Docker builds in 2de99e26):

  /goclaw.exe
  /goclaw-local.exe
  /tmp-reset-bot/
  /tmp-probe*/

Also add .railwayignore file — new; Railway copies the working tree
verbatim and already ignored tests/node_modules/etc. Add `tmp-*` glob
so ad-hoc probe dirs don't bloat image layers on `railway up`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Routine dependency refresh. v2.12 pulls in a new indirect dep
(git.sr.ht/~jackmordaunt/go-toast/v2 v2.0.3 — used by Wails'
notification path) and otherwise contains only maintenance fixes
from upstream.

Verified: `go build -tags sqliteonly ./internal/... ./cmd/` green —
desktop edition's Wails bindings still compile. No goclaw code calls
the notification API directly, so the new indirect is latent unless
Wails decides to pull it into the main path later.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… in strict mode

OpenAI strict mode rejected `use_skill` calls with:
  invalid_function_parameters: 'additionalProperties' is required to be
  supplied and to be false
  at path ('properties', 'params', 'type', '0')

Root cause: applyStrictMode early-returned when a node was
`{"type":"object"}` with no nested `properties` key, leaving
additionalProperties unset. Then makeNullable widened the type to
`["object","null"]`, and the strict validator checks each branch
independently — the "object" variant was missing additionalProperties
so the whole schema failed validation.

Fix: when typ=="object", set additionalProperties:false even if the
node has no inner properties. This covers the common "bag of
free-form params" shape tool authors write as:

  params: { type: "object", description: "..." }

Trade-off worth documenting: strict mode + bare object ⇒ OpenAI will
reject any call that actually populates those params. If a tool truly
needs a free-form dict of arbitrary keys, strict mode is the wrong
fit — either declare the known keys under `properties`, or opt that
tool out of strict. Left a comment on the branch spelling this out.

Regression test added: TestApplyStrictMode_BareObjectProperty asserts
both `additionalProperties:false` AND the `["object","null"]` type
union survive the transform.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previously, builds without the `embedui` tag left "/" unhandled,
so http.ServeMux returned a bare 404 when an operator opened the
deployed URL in a browser. That's needlessly opaque — it looks
like the service is broken even though the gateway is healthy.

Register a small JSON index handler for "/" in the no-UI branch.
It returns service status + protocol version + a list of real
endpoints for an exact "/" match, and a JSON 404 for any other
unmatched path (http.ServeMux routes "/" as a catch-all).

Embedui builds are unchanged — the webui.Handler() still owns "/"
and takes precedence.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Lets admins attach server-specific quirks (e.g. "no trailing semicolons
in code args") to MCP tool descriptions without modifying the upstream
MCP server or redeploying goclaw. Hints are stored in the existing
mcp_servers.settings JSONB (no migration) following the same pattern as
require_user_credentials.

Backend:
- Add mcp.ToolHints type + ParseToolHints() helper alongside requireUserCreds
- Add BridgeTool.WithHints(global, toolHint) setter that appends a suffix
  to Description() so the LLM sees hints as part of the tool schema
- Thread hints through connectServer / connectViaPool / loop_mcp_user
- 12 new unit tests for ParseToolHints + WithHints (+ edge cases)

Frontend:
- New "AI Agent Hints (Optional)" section in MCP form dialog: a global
  textarea + a key-value list for per-tool hints
- Extend KeyValueEditor with valueAs="textarea" prop (reusable)
- Zod schema + TS types + i18n (en/vi/zh) updated

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…cake parity)

Creating a Bitrix24 channel_instance failed with "invalid channel_type"
even though Phase 03 wired the factory and the UI added Bitrix24 to the
picker. The write path hits two parallel whitelist switches — one for
WS RPC in internal/gateway/methods and one for HTTP in internal/http —
and both missed bitrix24. The WS switch also silently missed facebook
and pancake, so anything driven through the WS API was rejecting
channels the HTTP API already accepted.

- Add bitrix24 to both switches.
- Add facebook and pancake to the WS switch so the two functions stop
  drifting. CHANNEL_TYPES in the UI constants file was already offering
  both.
- Annotate each switch with a keep-in-sync comment pointing at its twin
  and at ui/web/src/constants/channels.ts. The previous drift was silent
  (untyped string list across three separate files); the comment at
  least makes the next channel addition noisy enough to catch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the persistence layer for Bitrix24 portal OAuth state, decoupled
from the channel implementation so the channel package (PR 3) can land
incrementally:

- Migration 000058_bitrix_portals (PostgreSQL) — tenant-scoped portal
  rows with AES-GCM encrypted credentials and refresh state.
- internal/store/bitrix_portal_store.go — store interface (BitrixPortalStore).
- internal/store/pg/bitrix_portals.go — PostgreSQL implementation.
- internal/store/sqlitestore/bitrix_portals.go + schema patch — SQLite
  implementation for the desktop edition (lite). Schema v26 → v27.
- internal/upgrade/version.go — RequiredSchemaVersion 57 → 58.
- factory + stores.go wiring so consumers can resolve BitrixPortals from
  the standard StoreFactory.

Storage uses internal/crypto/aes.go (AES-256-GCM) for both credentials
(client_id + client_secret) and state (access/refresh tokens, member_id,
app_token, registered_bots, media folders). Empty encryption key falls
back to plaintext with a warn log per existing crypto contract.

Indexes:
  - UNIQUE (tenant_id, name) — one portal name per tenant.
  - (domain) — lookup by incoming webhook domain.

This is PR 2/3 split from nextlevelbuilder#1057. Stacked on PR 1 (nextlevelbuilder#1059).
Phase 03 wired the Bitrix24 OAuth install handler at `/bitrix24/install`
but shipped no RPC/UI path for creating the `bitrix_portals` row that
handler looks up. Operators currently have to shell into Postgres to
register a portal, which blocks the UI-driven setup flow that the
channel picker now offers.

This adds a direct-DB CLI command mirroring the `cmd/migrate.go`
pattern (resolveDSN → sql.Open(pgx, dsn)):

- `bitrix-portal create --tenant-id --name --domain --client-id
  --client-secret` writes one row via `PGBitrixPortalStore.Create`, so
  credentials are AES-256-GCM encrypted under `GOCLAW_ENCRYPTION_KEY`
  just like runtime writes. When the key is unset we log a visible
  WARNING — the pg store falls back to plaintext and we do not want
  operators to discover that silently.

- `bitrix-portal list [--tenant-id]` prints id/tenant/name/domain.
  Credentials are deliberately never printed; rows whose creds failed
  to decrypt surface as `(creds:empty)` so a corrupt row is obvious.

- `normalizeBitrixDomain` strips scheme + trailing slash so
  `https://tamgiac.bitrix24.com/` and `tamgiac.bitrix24.com` both land
  as the bare host Bitrix24's OAuth callback compares against.

After `create`, the printed hint guides the operator to the install
URL: `https://<public_url>/bitrix24/install?state=<tenant>:<name>`,
which populates the row's `state` column with the OAuth token.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-side)

Channel-side of the Bitrix24 integration. Consumes the BitrixPortalStore
shipped in PR 2/3 (nextlevelbuilder#1060) and the channel_type whitelist update from
PR 1/3 (nextlevelbuilder#1059).

Adds internal/channels/bitrix24/ package:
- channel.go / channel_test.go — Channel implementation (Start, Stop,
  Send, Reload), tenant-scoped MCP server lookup
- client.go — Bitrix REST client (im.message.add, im.bot.update,
  user.get, etc.) with token refresh
- dedup.go — Webhook event de-duplication by event_id
- events.go / handle.go — Event parser + handler routing (DM/group,
  CRM entity chats, Tasks chats)
- factory.go — channel registration + instance loader
- portal.go — Per-portal state cache + token rotation
- register.go — imbot.register lifecycle (idempotent + verify)
- router.go — Webhook router by portal domain
- send.go — Outbound send with chunking + BBCode rendering
- webhook.go — HTTP handler
- bootstrap.go — Warm portal cache from store on gateway boot

Wires the channel into the partner factory + cmd/gateway.go boot path.

This is PR 3/3 split from nextlevelbuilder#1057. Stacked on PR 2 (nextlevelbuilder#1060) which is
stacked on PR 1 (nextlevelbuilder#1059).
- ONIMBOTDELETE path: document that the second MarkStopped call
  intentionally overrides the generic Stopped summary set by Stop
  with an operator-visible reason. Previous comment read as though
  the line were redundant.
- stripMention: remove redundant TrimSpace of the regex replacement
  result. Trim responsibility now lives with the caller
  (handleMessage), which already TrimSpaces uniformly for both DM
  and group paths. No behavioural change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The admin UI hid Bitrix24 from the "Add Channel" dropdown even though
the backend factory (`internal/channels/bitrix24`) was wired into
gateway startup. Operators had no way to create a Bitrix24 channel
instance from the UI, so Phase 03 was effectively dark.

- CHANNEL_TYPES: prepend `{ value: "bitrix24", label: "Bitrix24" }`.
- credentialsSchema["bitrix24"] = []. Bitrix24 has no per-bot secrets
  — portal-level OAuth lives on `bitrix_portals` and is authorized
  once via `/bitrix24/install`.
- configSchema["bitrix24"]: portal / bot_code / bot_name (required),
  public_url (required — imbot.register demands absolute callback
  URLs), plus dm_policy, group_policy, require_mention,
  reaction_level, text_chunk_limit, media_max_mb, allow_from,
  group_allow_from, history_limit, streaming, block_reply. Defaults
  match `applyConfigDefaults` in the Go factory so first-time saves
  don't drift from the backend's zero-config behavior.

i18n: no new fieldConfig keys added — ChannelFields falls back to the
inline `label`/`help` via `t(..., { defaultValue })` so English
strings render without catalog entries. Translations can land later
without blocking the feature.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 03 only handled the OAuth2 marketplace app install flow (where
Bitrix24 redirects to /bitrix24/install with ?code=...&state=... and the
app calls oauth.bitrix.info/oauth/token to trade the code for tokens).
A Local Application registered via Developer resources > Local
application uses a different handoff: Bitrix24 POSTs AUTH_ID,
REFRESH_ID, AUTH_EXPIRES, DOMAIN, member_id, and application_token
directly into the handler URL, with no code to exchange — the tokens
are already minted.

Without this, admins using the Local App path (the easiest way to
test without publishing to Bitrix24.Market) hit `400 missing code or
state` and could never finish install, leaving the portal row with
an empty state column and imbot.register failing on first channel
start.

- webhook.go handleInstall now sniffs the form body: AUTH_ID +
  REFRESH_ID present -> Local App branch; code + state present ->
  OAuth branch. Either missing gets a 400 with a message naming both
  field sets so the operator sees which flow the payload matched.
- handleInstallLocalApp resolves the portal by DOMAIN (Local App has
  no state param to carry tenant:name). Builds a TokenResponse from
  form fields and hands it to Portal.InstallFromTokens.
- portal.go InstallFromTokens mirrors Exchange but skips the
  ExchangeAuthCode HTTP call. Rejects a half-install (either token
  missing) up front so we never persist a wedged row. Detaches the
  context for persistence for the same reason Exchange does — once
  Bitrix has handed us tokens, a canceled handler context must not
  prevent writing them to disk.
- webhook.go strconv import for AUTH_EXPIRES parsing (falls through to
  Portal.applyTokenResponse's defaultTokenTTL clamp on unparseable).

Handler path config in the Local App UI still needs to be split:
`Your handler path` -> /bitrix24/events (event callbacks after install)
`Initial installation path` -> /bitrix24/install (install handshake)
Before this commit, pointing both at /bitrix24/install would have
failed on the install POST anyway; now the install succeeds but
events need the /events route to reach handleEvent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both happy paths were silent, so when operators asked "did the reinstall
POST arrive?" or "did Bitrix POST anything to /bitrix24/events when I
chatted?" the logs could not answer. This hurt diagnostics badly: a
mis-configured handler URL on the Bitrix side is indistinguishable from
a working one until somebody checks the Bitrix app admin.

- handleInstallLocalApp: emit INFO after persistState succeeds, with
  tenant/portal/domain/member_id/expires_in. This confirms the Local App
  install flow ran end-to-end (first-install and reinstall both).
- handleEvent: emit INFO at entry with remote addr, Content-Length, and
  User-Agent — even before ParseEvent, so malformed bodies still show.
  Tagged with a note that this can drop to Debug once wire-up is stable.

No behavior change. Both lines are slog.Info, no PII beyond member_id
(already persisted in portal state) and the remote IP (Bitrix24's).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Root cause of "handler URL bound but no events ever POST": after a
fresh imbot.register the EVENT_MESSAGE_ADD / ONIMBOTJOINCHAT / ONIMBOTDELETE
URLs showed up correctly in event.get, yet chatting the bot produced no
bitrix24 event: inbound log line. Proved via probe:

  app.info -> { "INSTALLED": false, ... }

Bitrix24 silently suppresses imbot events while INSTALLED=false, even
though the handler URLs are otherwise set up. The signal to flip INSTALLED
to true is `BX24.installFinish()` called from the install handler's HTML
response — without it the app is stuck in "install in progress" forever.

Our installSuccessHTML was pure static HTML: a title, a paragraph, and a
window.close() timeout. No BX24 SDK, no installFinish. Both install flows
served it (OAuth and Local App), so both flows left INSTALLED=false.

- Load //api.bitrix24.com/api/v1/ so the BX24 JS SDK is available in the
  install popup iframe.
- In the body script, BX24.init(() => BX24.installFinish()). Wrap in
  try/catch so a direct-URL smoke test (outside the Bitrix iframe) still
  falls through to window.close() instead of leaving the tab orphaned.
- Long comment on the constant explaining why this matters — the next
  engineer to edit this HTML must not strip the installFinish call.

Scheme-relative src is intentional — the install popup runs inside the
Bitrix24 iframe which is always https, so //api.bitrix24.com inherits the
correct protocol without hard-coding it.

After this ships, a reinstall (click REINSTALL in Local App admin) will
mark INSTALLED=true and Bitrix starts delivering events.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bitrix24 chat (imbot.message.add) renders a restricted BBCode subset,
not Markdown. LLM replies emitted `**bold**`, `*italic*`, `[link](url)`,
triple-backtick fences, etc. — all of which showed up as literal
characters in the Bitrix bubble, making bot replies look unpolished
and hard to scan.

- format.go: new markdownToBitrixBBCode + helpers covering:
  * bold (**x** __x__ <b> <strong> → [b]x[/b])
  * italic (*x* _x_ <i> <em> → [i]x[/i]) with intra-word underscore guard
  * strikethrough (~~x~~ <s> <del> → [s]x[/s])
  * inline code (`x` <code> → [code]x[/code])
  * fenced code (```lang\n…``` → [code]\n…\n[/code], lang hint dropped)
  * links ([t](url) <a href="url">t</a> → [url=url]t[/url])
  * headers (#, ##, … → [b]…[/b], Bitrix has no header tag)
  * lists (-, *, + → • ; ordered 1. passes through)
  * blockquote (> x → [quote]x[/quote]) grouping consecutive lines
  * tables (GitHub-style | h | sep | body → [code] monospace block)
  * horizontal rule (---, ***, ___ → unicode ──── divider)

  Placeholder scheme uses \x00…\x00 framing so nested extraction
  (fenced > table > inline) can't collide. Input NULs are stripped on
  entry (would otherwise corrupt restoration).

- format_test.go: 25 table-driven cases covering each transform plus
  3 regression cases (italic adjacent pairs needing stability loop,
  single-line fenced losing `code` as phantom lang hint, NUL-input
  collision with placeholder framing), plus a mixed-feature Vietnamese
  smoke test (#TESTS).

- send.go: call markdownToBitrixBBCode on `msg.Text` before chunking.
  Conversion BEFORE chunking so chunker operates on final wire shape —
  splitting half-converted Markdown would leak a lone `**` to the
  client. Caveat: chunker is still tag-agnostic, so a BBCode pair
  straddling the 4000-rune boundary can still split across chunks.
  Documented in the callsite; fix when it becomes visible in practice.

- send.go: move client/botID re-check INSIDE sendChunk instead of
  holding the result of Send's liveness check across the chunk loop.
  Shutdown during a multi-chunk reply now surfaces as a clear
  "channel lost during send" error on the chunk that raced, rather
  than silently calling a nil client.

Idempotent on already-BBCode input — the conversion regexes key off
Markdown markers that don't appear in [b]/[i]/[code]/[url=…] syntax.
Locked by TestMarkdownToBitrixBBCode_IdempotentOnBBCode.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bitrix24 Local Application install sends AUTH_ID / REFRESH_ID /
member_id / DOMAIN but OMITS application_token. That token first
appears in the event stream (ONAPPINSTALL / ONIMBOTMESSAGEADD / …).
Before this change, every event was rejected with "portal not
installed" because `want := portal.AppToken()` was empty, and the
only recovery was a manual DB patch.

- portal.BootstrapAppToken: accept event's app_token ONLY when all of
  (a) no stored app_token yet (never overwrite a good value),
  (b) stored MemberID is non-empty AND matches the event's member_id.
  Idempotent on re-call. Refuses to seed MemberID from the event body
  — if install didn't populate it, something upstream is wrong and
  we must not let the first (potentially spoofed) event pin the
  portal's identity. Legacy rows without MemberID require reinstall.

- handleEvent: when portal.AppToken() is empty and the event carries
  a non-empty app_token, call BootstrapAppToken; on success fall
  through to the normal secureEqual() check. Failure is logged as
  security.bitrix24_apptoken_bootstrap_failed and returns 401.
  Existing classic path (no stored + no event → 401) preserved.

- router_test.go: 3 new tests covering the TOFU trust boundary:
  * BootstrapAppToken_200 — happy path, second event auths normally.
  * BootstrapAppToken_MemberIDMismatch_401 — spoofed event with wrong
    member_id must not poison the stored token.
  * BootstrapAppToken_NoStoredMemberID_401 — MemberID-less legacy row
    refuses auto-heal; reinstall is the only safe recovery.
  Existing PortalNotInstalled_401 updated to pass empty event token
  so it keeps testing the no-seed-material path.

Opt-in raw event dump:
- webhook.go: BITRIX24_LOG_RAW_EVENT=1 env at process start arms a
  sorted, credential-redacted dump of every parsed form body. Off by
  default — the dump leaks user message text and is noisy — but
  indispensable for one-shot captures when debugging new Bitrix24
  payload shapes.
- isRedactedEventKey matches on the LEAF key name (stripping bracket
  path), so the SAME secret duplicated under `auth[access_token]`,
  `data[BOT][id][access_token]`, and `data[BOT][id][AUTH][access_token]`
  is caught at every nesting. Covers access_token / refresh_token /
  application_token / client_id / client_secret / AUTH_ID / REFRESH_ID.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reverts the custom bitrix_mcp_user_mapping store + ensureMCPCredentials
gatekeeper added in c675dd6a/e5f2fe34. Audit revealed the integration
duplicates partner-owned MCP infra: MCPServerStore.GetUserCredentials
already covers per-user credentials, GrantChecker.IsAllowed already gates
access with cache + event-bus invalidation, and ContactCollector already
handles channel-to-canonical identity resolution.

Phase A scope — strip everything that competes with existing infra:
  - Drop bitrix_mcp_user_mapping table + PG/SQLite migrations
  - Drop store.BitrixMappingStore interface + impls
  - Drop ensureMCPCredentials + 4-arg FactoryWithPortalStore signature
  - Drop mcp/mapping store fields on Channel

Keep two pieces that stand on their own:
  - isGroupMessageType() fixes MESSAGE_TYPE parsing (Bitrix webhooks use
    short codes P/C/O, original code only matched legacy "chat")
  - mcp_client.go (HTTP auto-onboard client) — unwired for now, reused
    in Phase C as the bitrix24-local lazy provisioner once Phase B
    wires the channel into GrantChecker + MCPServerStore properly

See plans/bitrix24-mcp-refactor/README.md for full phased plan.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ector

Call ContactCollector.EnsureContact inside handleMessage after policy
gating, matching the Telegram/Zalo pattern. This plugs Bitrix24 into
the partner's existing contact table so downstream layers (agent loop,
MCPUserCredentials lookup, admin UI) resolve senderID → userID through
the same path every other channel uses. No parallel mapping layer.

- DM: 1 upsert (sender, peer=direct, ctype=user)
- Group: 2 upserts (sender as peer=group/ctype=user, chat as
  peer=group/ctype=group)
- System messages & empty content are filtered BEFORE the call, so
  blocked senders are never recorded.

Tests in handle_test.go use a fakeContactStore mock to assert call
count and argument shape for DM / group / blocked paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Before this change imbot.register was called with TYPE="B" hardcoded,
so goclaw could only register standard internal chatbots. Bitrix24
supports a second type "O" (Open Channel bot) for customer-facing
widgets, which we need to support for external-customer deployments.

Changes:
- bitrixInstanceConfig: new BotType field with full doc block covering
  the operational differences (audience, policy recommendations, MCP
  implications).
- applyConfigDefaults: empty BotType → "B" to preserve existing
  deployments' behavior (matches Bitrix24's imbot.register default).
- Factory: validate BotType ∈ {"B", "O"} AFTER defaults. Unknown values
  (including lowercase / whitespace-only) are rejected at load, not at
  runtime — prevents silent no-op bots when admin typos the config.
- registerParams: forward cfg.BotType to TYPE verbatim.
- Channel.IsOpenChannelBot(): helper for downstream code (Phase C
  provisioner will use this to skip per-user MCP credential minting
  for transient Open Channel customers).

No policy auto-switching based on BotType — admin is responsible for
setting dm_policy=open when deploying an Open Channel bot, matching
the "1 channel = 1 fixed default" convention other channels follow
(Bitrix24 = pairing, Zalo personal = allowlist).

Tests cover: default = "B", explicit "B" / "O", rejection of unknown /
lowercase / whitespace-only values, IsOpenChannelBot() matching the
type, and registerParams forwarding the type to the API call body.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds per-user MCP credential minting on first message. Opt-in via instance
config (mcp_server_name + mcp_base_url) AND env GOCLAW_BITRIX_MCP_ADMIN_TOKEN.
Off by default — channels without the wiring operate identically to before.

Behavior:
- On first message from user U, channel checks mcp_user_credentials(server,U).
  Present → skip. Absent → POST {base_url}/api/auto-onboard with admin token
  + U's OAuth tokens, store returned api_key plus OAuth bundle in the Env
  map so MCP server can re-authenticate on subsequent tool calls.
- Open Channel bots (bot_type=O) are skipped entirely: transient customers
  don't map to tenant_users and shared-credential support lands in Phase E.
- Every failure mode is best-effort: logged via slog.Warn/Debug and swallowed.
  Agent loop downstream sees no creds → skips that MCP server's tools, which
  is strictly better UX than denying the user's message. Typed sentinel
  errors (ErrProvisionSkippedOpenChannel / ErrProvisionDisabled /
  ErrProvisionDebounced) let tests assert behavior without string matching.
- 60s per-(serverID,userID) debounce absorbs Bitrix24 webhook retry bursts
  (3–5 events/sec on transient 5xx) without hammering the MCP server.

Wiring:
- New FactoryWithPortalStoreAndMCP factory variant. Old FactoryWithPortalStore
  becomes a thin shim (nil mcpStore) so other call sites stay on the simpler
  2-arg signature. Gateway switches to the MCP-aware variant.
- Half-config (only one of mcp_server_name / mcp_base_url set) fails fast
  at factory load, preventing a silent "provisioning disabled but I meant
  to enable it" misconfiguration.
- Admin token stays in env instead of config — channel_instances.config is
  plaintext JSONB, and Phase D RFC will propose a per-tenant secret store.

Tests: 9 new tests in provisioner_test.go covering Open Channel skip,
disabled modes, existing-creds fast path, mint + persist happy path (including
EXPIRES_AT format), debounce, HTTP failure surfacing, missing-auth validation,
init disabled modes, and half-config factory rejection.

Build/vet/test all green on PG and SQLite.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase C follow-up addressing two pieces of technical debt flagged during
the multi-tenant review:

1. Admin token resolution:
   - Primary source: mcp_servers.env["BITRIX_MCP_ADMIN_TOKEN"] (per-tenant,
     encrypted at rest via MCPServerStore). Same key the MCP server process
     receives at spawn, so admin sets the secret in one place.
   - Fallback: GOCLAW_BITRIX_MCP_ADMIN_TOKEN env var (legacy single-tenant
     deployments). Kept to avoid forcing a DB migration on upgrade.
   - Logged token_source on startup so operators can see when a deployment
     is still riding the legacy global env var (multi-tenant unsafe).
   - resolveMCPAdminToken extracted for unit testing.

2. User degradation notice:
   - When provisionIfMissing hits an unexpected failure, notifyUserOfMCPIssueOnce
     sends a one-shot Vietnamese DM via sendChunk so the user knows to
     contact admin instead of silently getting tool-less replies.
   - Per-user 5min debounce (separate map / mutex from the provisioning
     debounce) absorbs webhook retry storms and sustained MCP outages
     without duplicating the notice.
   - Channel health stays Green — degradation is surfaced to the user, not
     the health endpoint, matching operator preference for silent fallback.

Test coverage:
- resolveMCPAdminToken: 6 subcases covering both-set, per-server only,
  env only, both empty, whitespace guards, malformed JSON fallback.
- initMCPProvisioner: end-to-end per-server token flows through to the
  Authorization: Bearer header on auto-onboard.
- notifyUserOfMCPIssueOnce: first-call marks debounce, second call within
  TTL keeps original stamp (no refresh), expired TTL refreshes, empty/
  whitespace inputs are no-ops, multi-user isolation holds.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bitrix24 webhooks don't carry display_name / username, only a FromUserID
string, so the Contacts page rendered bitrix24 rows with empty Name and
Username columns. Add a lazy per-channel enrichment path: on first-sight
of each sender, call user.get once and cache the (name, username) pair
for 1h (5min for failures / unresolvable IDs so scope fixes recover
without a channel reload). Any RPC failure is logged at Debug and the
contact is still created with empty fields — the pre-enrichment path.

UI: add "bitrix24" to PERM_CHANNELS so the "Channel permissions for
contact names" panel documents the `user` scope requirement, and to
CHANNEL_TYPES so the filter dropdown surfaces the channel. Ship i18n
copy for en / vi / zh.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Backend has supported `bot_type` (B / O), `mcp_server_name`, and
`mcp_base_url` since the Phase C work landed (commits 655d24b6,
90e225b0), but the Create/Edit Channel Instance form didn't render
inputs for any of them — operators had to hand-edit the raw config
JSON. Add three schema entries:

- bot_type: select B (standard, default) / O (Open Channel). Backend
  validates the values at factory load; select keeps the options
  self-documenting.
- mcp_server_name + mcp_base_url: both marked `advanced` so they
  tuck under the Advanced panel. Help text calls out the "both or
  neither" coupling so users don't half-configure and hit the
  factory-level validation error.

Also trim `.railwayignore` to skip docs / changelog / readme assets
(_statics, _readmes, examples, CHANGELOG, CONTRIBUTING,
api-reference, websocket-protocol) — upload payload drops from
9.4 MB to 4.25 MB, which matters on slow uplinks where the full
tarball was timing out before reaching the Railway backboard.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…auth (Path B)

The MCP server now authenticates /api/auto-onboard by verifying the
caller-supplied Bitrix access_token against `profile` — no shared admin
secret required. Removes:
- BITRIX_MCP_ADMIN_TOKEN env key lookup in mcp_servers.env
- GOCLAW_BITRIX_MCP_ADMIN_TOKEN process env var
- resolveMCPAdminToken two-source fallback
- Authorization: Bearer header on auto-onboard POST

Also removes dependent test cases:
- TestResolveMCPAdminToken (6 subtests)
- TestInitMCPProvisioner_UsesPerServerToken
- TestInitMCPProvisioner_DisabledModes/missing_admin_token subtest

This unblocks marketplace distribution where each portal installs the
app and runs its own GoClaw — no per-portal shared secret provisioning.
Multi-tenant isolation holds naturally because each user authenticates
with their own per-portal OAuth tokens.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Documents the finalized integration between goclaw's custom Bitrix24 channel
and mcp-bx-syn, reflecting the shipped Path B design (access_token as auth
anchor, no shared ADMIN_TOKEN). Supersedes the earlier Rev4 draft: drops the
mapping-table proposal in favor of reusing the partner MCPServerStore, folds
in rate limiting + audit log + hourly re-verify, and aligns the contract
shape with the commits `ea09c1ba` (phase C provisioner) and `07b48ef0`
(admin-token removal).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…GINAL

Group ONIMBOTMESSAGEADD events strip the @mention from MESSAGE before
sending the webhook, so isMentioned(MESSAGE) never matched and group
messages with require_mention=true silently dropped — typing indicator
fired (Bitrix sees the bot's IM channel is up) but no agent turn ran.

Bitrix24 ships two higher-authority mention sources on group payloads:

  - data[PARAMS][MENTIONED_LIST][<bot_id>]=<bot_id> — structured map,
    no regex / Unicode edge cases.
  - data[PARAMS][MESSAGE_ORIGINAL] — raw BBCode with the bot tag intact
    ([USER=<bot_id>]Name[/USER]).

Both absent on DMs, where MESSAGE retains the @mention naturally.

EventParams now carries both fields. parseFormEvent + parseJSONEvent
populate them. New isMentionedParams() walks the three sources in
authority order (list → original → stripped) and replaces the
single-source check at handleMessage:111. Existing isMentioned() and
stripMention() are unchanged so DMs and the BBCode strip step keep
behaving exactly as before.

Verified end-to-end against tamgiac.bitrix24.com:
  inbound (raw dump shows MENTIONED_LIST[940]=940)
    → scheduling message (peer_kind=group chat_id=chat4838)
    → agent turn → outbound published → bot reply visible in Bitrix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ONIMBOTMESSAGEADD events from Bitrix24 task chats arrive with
MESSAGE_TYPE=X (CHAT_ENTITY_TYPE=TASKS_TASK), but isGroupMessageType
only matched C/CHAT/O/OPEN. X fell through to direct, with two
consequences:

  1. The require-mention gate was bypassed (DM path doesn't enforce
     it), so the bot replied to every message in the chat instead of
     only when @mentioned.
  2. Routing built a session key of `direct:chatNN` and a user_id of
     the raw sender, mixing per-task conversation history into the
     user's DM context. peer_kind=group + user_id=group:<channel>:<chatID>
     is what isolates entity-bound chat memory.

Add "X" to the group switch (uppercase-normalised, like the other
codes) and extend TestIsGroupMessageType with `X`/`x`/` X ` cases.
Existing C/CHAT/O/OPEN/P/private behaviour unchanged.

Verified live against tamgiac.bitrix24.com task chat:
  raw dump → MESSAGE_TYPE=X CHAT_ENTITY_TYPE=TASKS_TASK
    → scheduling: peer_kind=group session=group:chat4644 ✓

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…data

Bitrix24 webhooks for CRM-bound, Tasks-bound, Workgroup, and Open
Channel chats ship two fields that uniquely identify the entity the
chat belongs to:

  data[PARAMS][CHAT_ENTITY_TYPE]  e.g. CRM, TASKS_TASK, SONET_GROUP, LINES
  data[PARAMS][CHAT_ENTITY_ID]    e.g. DEAL|2064, LEAD|7, 2704

The parser silently dropped both. As a result, when a user chatting
inside a deal-bound chat asked "get me the customer for this deal",
the bot had no way to deterministically resolve the bound deal — it
either had to guess from CHAT_TITLE strings or ask the user for the id.
For MCP-equipped agents this defeats the value of binding the bot to
the entity in the first place.

Add ChatEntityType + ChatEntityID to EventParams. Populate from both
parseFormEvent (form-urlencoded) and parseJSONEvent (JSON variant).
Surface them on bus.InboundMessage.Metadata as
`bitrix_chat_entity_type` and `bitrix_chat_entity_id` so agent
pipelines can:

  - inject "you are chatting about <entity>" into the system prompt, or
  - pre-fetch `crm.deal.get(2064)` and put the result in context.

Plain user-created chats omit both metadata keys — downstream readers
that do `_, ok := metadata[...]` see no key, not an empty string.

Tests cover:

  - parser populates fields from CRM Deal, CRM Lead, Tasks chats and
    leaves them empty for plain groups (form + JSON paths)
  - handler propagates them to bus.InboundMessage.Metadata with the
    expected `bitrix_*` key prefix and absent-key semantics for plain
    chats

Verified live: gateway accepts the new payload shape and continues
scheduling messages cleanly with peer_kind=group for the existing
chat 4644 (Tasks/X) test case from commit 26d7d76.

NOTE: this is the channel-side half of "bot understands the entity it's
chatting in". The pipeline still needs to consume these metadata keys
to actually inject context — separate change in internal/pipeline.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…prompt

Closes the loop on the channel-side metadata forwarding from f8e000f.
The bus.InboundMessage already carries `bitrix_chat_entity_type` +
`bitrix_chat_entity_id` whenever the inbound message comes from a chat
bound to a Bitrix24 module entity (CRM Deal/Lead/Contact, Tasks task,
Workgroup, …). Until now the consumer dropped the metadata on the floor
because the agent loop has no Bitrix-aware code path.

Add a fourth `extraPrompt` block in processNormalMessage, right after
the existing channel-self-identity hint. When both metadata keys are
present it appends a `<extra_context>`-bound hint telling the LLM:

  - which entity type/id this chat is scoped to,
  - to treat deictic phrases ("this deal", "this task") as referring
    to that id rather than asking the user,
  - how to map CRM pipe-format ids (e.g. `DEAL|2064`) to MCP tools
    (crm.deal.get / crm.lead.get / crm.contact.get / crm.company.get),
  - that Tasks ids pass directly to tasks.task.get.

The block follows the exact pattern of the three sibling blocks:
group-chat tone, MetaTopicSystemPrompt, MetaChannelSelfIdentity. Single
inline metadata-key check, fmt.Sprintf into extraPrompt, then read by
agent.RunRequest.ExtraSystemPrompt -> BuildSystemPrompt -> rendered into
the standard <extra_context> system-prompt section. No new pipeline
stage, no new agent fields, no new RPC.

Plain user-created chats omit both metadata keys -> no hint added -> no
behaviour change for non-Bitrix or non-entity-bound chats.

Verified live against tamgiac.bitrix24.com Task chat 4644 (TASKS_TASK
id=2704) using gemini-native:

  user:  "@bot Synity task này có gì"
  bot:   "Chúng ta đang trò chuyện trong một nhóm chat Bitrix24
          được liên kết với một nhiệm vụ (TASKS_TASK) có ID là 2704"

The LLM picked up the hint and articulated the binding without being
asked. (The bot doesn't yet call tasks.task.get to fetch task fields —
that's an MCP-wiring concern for a separate phase, not a hint problem.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
tech-synity and others added 6 commits April 28, 2026 18:03
In group chats with N users asking the bot in parallel, replies without an
explicit @mention force users to guess "is this answer for me?". Adding the
sender's @mention as a prefix ([USER=<id>][/USER]) addresses two UX issues:

  1. Multi-user concurrent Q&A — Bitrix renders the BBCode as a clickable
     "@Đặng Văn Tình" prefix on the reply, so the asker (and bystanders)
     immediately see who the answer belongs to. Without this prefix, when
     two users ask within a few seconds of each other, the bot's two
     replies are visually indistinguishable.
  2. Notification routing — Bitrix sends a push to the @mentioned user, so
     mobile users get notified about their own answer even when they
     stepped away from the active dialog. reply_to_message_id alone (the
     existing thread linkage) does not trigger push.

Implementation flows the addressee through the existing metadata channel
that already carries reply_to_message_id, so no signature changes:

  - cmd/gateway_consumer_normal.go: when isGroup, set
    `outMeta["bitrix_address_user_id"] = msg.SenderID`. Skip synthetic
    senders (ticker, notification, system) via bus.IsInternalSender so
    those flows do not emit a stray @mention to a non-real user.
  - internal/channels/bitrix24/send.go: read the metadata key after the
    Markdown→BBCode conversion and prepend `[USER=<id>][/USER] ` to the
    text BEFORE chunkText runs. Prepending pre-chunk guarantees the
    mention only appears on the first chunk, not every chunk of a long
    reply. Empty inner content sidesteps escaping for names with BBCode
    metacharacters and reflects renames between turns — Bitrix renders
    the user's current display name from id at delivery time.
  - Self-mention guard: if the addressee id equals the bot's own id (a
    synthetic relay loop or a future code path injecting the bot id by
    mistake), suppress the mention to avoid the bot @-mentioning itself.

DM behaviour unchanged — gateway_consumer only sets the key when
isGroup=true. Other channels (Telegram, Slack, Discord, …) ignore the
`bitrix_*` prefix as expected.

Tests:
  - TestBuildAddressMention: 7 cases covering empty/whitespace metadata,
    real ids, whitespace trim, self-mention guard, and unknown-bot-id
    fallback (trust consumer gating when bot is not yet started).
  - Send() integration coverage left to existing httptest tests; the
    helper is the entire logic.

Verified live against tamgiac.bitrix24.com group chat 4686:
  user @bot Synity test mention prepend
  bot:  @Đặng Văn Tình …answer body…
  (Bitrix renders the BBCode as a clickable @-mention; sender receives
   a push notification.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…chats

In group chats the gateway consumer rewrites RunInput.UserID to a group-
scope composite key (e.g. "group:bitrix-synity:chat4838" for Bitrix24, or
"guild:<id>:user:<sender>" for Discord) so multiple users in the same
chat share conversation memory and session state. That composite is
correct for memory but wrong for resources scoped per-actor:

  - MCPUserCredentials rows are minted per-user by the lazy provisioner
    (e.g. Phase C in internal/channels/bitrix24/provisioner.go) and keyed
    by the real external user id (= SenderID). Looking them up with the
    group-composite key always missed the row, so MCP tools silently
    disappeared in group chats — visible in production as the bot
    knowing the deal/task it's bound to (via the entity-binding hint in
    cmd/gateway_consumer_normal.go) but unable to actually call
    crm.deal.get / tasks.task.get because no credentials loaded.
  - RBAC grants and audit attribution must reflect the real actor, not
    the shared group container.

Add resolveActorUserID(userID, senderID, peerKind) to the agent package.
For peerKind=="group" with a non-empty SenderID it returns SenderID;
otherwise it returns UserID unchanged (DM behaviour preserved, synthetic
ticker/notification senders fall back safely). Update the single call
site in loop_pipeline_callbacks.go (makeBuildFilteredTools) so per-user
MCP tool loading queries with the actor key.

This is a generic agent-layer fix — not Bitrix24-specific. Bitrix24 is
the only channel today that ships per-user MCP credentials, so it was
the only channel exhibiting the bug, but the same routing already flows
for any future channel adding per-user OAuth-backed integrations
(Telegram + Salesforce, Slack workspace + Odoo, internal SSO + ERP, …).

Tests: TestResolveActorUserID covers DM, Bitrix-style group composite,
Discord guild composite, empty SenderID fallback, empty PeerKind, and
unknown PeerKind (must not auto-treat as group). Existing agent tests
still pass; the three pre-existing TestExpandWorkspace failures on
Windows are unrelated (path separator issue, present on main).

Discovery context + alternative fix designs are documented in
plans/bitrix24-mcp-refactor/reports/retrospective.md (will be updated
in a follow-up commit on the bitrix24 branch).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PGMCPServerStore.GetServerByName scopes lookups by tenant_id from context.
Channel.Start receives ctx from instance loader without that scope set —
wrap explicitly with the channel's own tenant id so the lookup matches
the row a tenant admin created via 'bitrix-portal create'.
When public_url changes (tunnel rotated, new host), persisted bot cache
short-circuits imbot.register and Bitrix-side event handler URLs go stale.
Setting BITRIX24_FORCE_REREGISTER=1 bypasses the cache so the next Start()
pushes current public_url + bot config back through imbot.register without
recreating the bot row.

Also expose BITRIX24_LOG_RAW_EVENT for raw-event debugging.
…interpolation

bitrix_chat_entity_type / bitrix_chat_entity_id flow from Bitrix webhook
into the agent system prompt unsanitized. A malicious or compromised
portal could craft CHAT_ENTITY_ID with embedded newlines + injected
SYSTEM directives to steer the agent.

Add isSafeBitrixEntityToken — caps length (et: 64, eid: 128) and rejects
control characters (< 0x20, 0x7f). Permissive on the printable range to
preserve legitimate ids (DEAL|2064, TASKS_X, ĐƠN_HÀNG). Rejected
payloads emit security.bitrix24.entity_metadata_rejected with lengths,
no payload contents.

Addresses MEDIUM finding from PR nextlevelbuilder#1057 review.
When goclaw is upgraded but `migrate up` hasn't run, BootstrapPortals
fails with "relation bitrix_portals does not exist" buried in a generic
"bootstrap failed" log. Operators miss it and Bitrix24 channels
silently no-op.

Detect the missing-table case (PG + SQLite text variants) and emit a
distinct warning naming migration 000058 + the remediation command, so
the next `grep migrate` from a confused operator finds it immediately.

Addresses MEDIUM finding from PR nextlevelbuilder#1057 review.
…r channels [B24:2802]

Three coordinated fixes to make agent loop honor per-user MCP credentials
correctly when the gateway consumer rewrites UserID (group composites + DM
merged contact UUID):

- C1: resolveActorUserID accepts channelType. Bitrix24 channel always
  prefers SenderID over the (possibly rewritten) UserID because the lazy
  provisioner stores MCP credentials keyed by SenderID. Without this, DM
  with merged contact and group chat both miss the credential row and MCP
  tools silently disappear.

- C2: BridgeTool no longer registers per-user instances into the shared
  tool registry. First-user-wins behavior leaked the first user's
  BridgeTool (and api_key) to every subsequent user. Per-user tools now
  live only in mcpUserTools sync.Map; pipeline tool callbacks resolve via
  new executeToolForActor helper which checks the actor's cache first
  before falling back to the shared registry (non-MCP tools).

- A-G1: buildMCPToolDescs accepts actorUserID and only surfaces MCP tool
  descriptions the actor actually owns. Previous fallback scanned all
  users in mcpUserTools, which let the LLM see tools it couldn't invoke
  via executeToolForActor — producing 'unknown tool' loops for cron /
  synthetic events with empty SenderID and for users whose creds were
  just purged after 401.

Includes 11 test cases covering DM/group/Bitrix24/Telegram/synthetic event
permutations.
…ed outbound log [B24:2802]

- C3: provisioner refreshes MCP credentials proactively when stored
  BITRIX_EXPIRES_AT is within mcpCredsRefreshWindow (5min). Prevents the
  upcoming tool call from racing a freshly-stale token. Legacy rows
  without expiry meta refresh once to write the meta column then follow
  the warm-skip path on subsequent events.

- C4: BridgeTool detects 401 Unauthorized from MCP transport and flips
  connected=false. Next user event triggers getUserMCPTools to clear the
  cache and re-acquire — which in turn drives DeleteUserCredentials when
  the pool 401s again, and provisioner re-onboards via fresh tokens from
  the next event. Without this, BridgeTool kept hammering the revoked
  api_key until pool idle-evict (15min).

- C5: structured outbound MCP tool call logging — mcp.tool.call.outbound
  (Debug), mcp.tool.call.done (Debug), mcp.tool.call.error (Warn),
  mcp.tool.call.auth_expired (Warn). Each line has server, tool, user_id,
  latency_ms, plus args_keys (key names only, never values) for
  operator-side incident debugging without leaking PII.

Includes 3 new provisioner tests covering near-expiry refresh, warm-expiry
skip, and legacy-no-expiry refresh.
… toggle [B24:2802]

- C6: inject 'CRM Data Freshness Policy' section into Bitrix24 agent system
  prompt. Tells the LLM to always re-fetch CRM record fields (deal,
  contact, task, lead, calendar) via MCP tool rather than recalling from
  conversation history — protects against permission changes mid-session
  where the LLM might surface fields the user no longer has access to.
  Scoped to channelType == bitrix24; other channels unaffected.

- Phase 0 debug toggle: webhook handler emits an UNMASKED single-line dump
  of auth.access_token / refresh_token / scope / sender_id when
  BITRIX24_DEBUG_UNREDACTED_TOKEN=1. Used by verify-token-identity.sh in
  plans/260512-1640-bitrix24-mcp-permission-fix to confirm whether
  Bitrix-issued event tokens are sender-bound or installer-bound.
  Gated behind explicit env var; off by default.
Bitrix24 strips ALL [USER=...] mentions from MESSAGE on group webhooks,
including mentions of users other than the bot. Agent received empty or
context-stripped text (e.g. "[USER=982]A[/USER] [USER=62]B[/USER] hi"
became just "hi"), losing who was addressed.

Switch group path to read MESSAGE_ORIGINAL (raw BBCode), strip THIS bot's
own tag, then convert remaining user/bot mentions to "@name (ID:<id>)" for
the LLM. Falls back to MESSAGE on legacy portals without MESSAGE_ORIGINAL.
…m prompt [B24:2794]

LLM was hallucinating placeholder domains (e.g. "bitrix24.example.com")
when asked to share Bitrix24 record links, even though the portal domain
is known from channel config, OAuth event, and the portal DB row.

Add buildBitrix24EntityLinkSection(domain) that emits a per-tenant URL
reference for tasks, deals, leads, contacts, companies, orders, payments,
shipments, calendar events, and chats. Section is injected only when the
channel type is bitrix24 and the domain is non-empty.

Propagate the portal domain through: msg.Metadata[bitrix_portal] →
RunRequest.BitrixPortalDomain → pipeline RunInput → buildMessages →
SystemPromptConfig. Helper normalizes accidental scheme/path prefixes.
…L [B24:2794]

LLM was falling back to the placeholder /tasks/task/view/{task_id}/ path
(which 404s) because the section showed {user_id} without telling it what
to fill in. Bitrix24's actual task view URL needs a viewer user_id:
  /company/personal/user/{viewer_user_id}/tasks/task/view/{task_id}/

Resolve the current sender's numeric Bitrix user id (FROM_USER_ID, already
on RunContext) and bake it into the Task URL pattern before the prompt is
built. Section instructs the LLM that the id is the viewer and can be
replaced with another user's id when sharing a link from their view.

Numeric-only gate prevents synthetic senders (cron, ticker:system, telegram
username form "12345|alice") from leaking into the URL path.
…URL [B24:2794]

Backend half of Bitrix24 portal self-service onboarding.

Portal-state public URL capture (replaces per-channel config.public_url):
- /bitrix24/install captures the gateway URL Bitrix24 used to reach us into
  bitrix_portals.state.public_url; closed-loop because the URL is proven
  reachable by the request that arrived
- Channel.eventHandlerURL prefers portal.PublicURL() with legacy
  cfg.PublicURL fallback + deprecation warning

Gateway WS RPC (4 methods, tenant-scoped, credentials masked):
- bitrix.portals.list (any tenant member)
- bitrix.portals.create (RoleAdmin; fails fast with FAILED_PRECONDITION when
  gateway URL not yet observed — no orphan rows)
- bitrix.portals.get_install_url (any tenant member; resume-authorize flow)
- bitrix.portals.delete (RoleAdmin; blocked when channel_instance refs portal)

Snapshot trust model:
- Gateway public URL captured at WS upgrade into client.upgradeURL, only
  promoted to server-wide PublicURLSnapshot after client.authenticated=true
  in sendConnectResponse — unauthenticated probes can't poison the URL
- SetIfPublic additionally rejects loopback/RFC1918/link-local/.localhost
  hosts so a developer's SSH tunnel doesn't pollute the shared snapshot

Includes app_token rotation/bootstrap hardening (RotateAppTokenIfTrusted,
SetOnTokenRefresh, onRefresh callback) already referenced by webhook.go but
uncommitted — bundled to keep the branch buildable.

Tests: 33 new sub-cases — tenant isolation, RBAC, URL derivation,
private-host rejection, install handler capture, eventHandlerURL preference,
duplicate-name handling, in-use delete guard, credential masking.
…4:2794]

Frontend half of Bitrix24 portal self-service onboarding. Pairs with backend
commit adding bitrix.portals.{list,create,get_install_url,delete} RPC.

Portal field UX (channel-fields.tsx, channel-instance-form-*.tsx):
- Replace free-text Portal input with dynamic <Select> populated via
  bitrix.portals.list (special-cased in FieldRenderer for bitrix24.portal —
  not a generic FieldDef extension; deferred until 2nd use case per YAGNI)
- Pending portals are clickable (RESUME_PREFIX sentinel) → open modal at
  authorize step via bitrix.portals.get_install_url; badge says "click to
  resume"
- Submit blocks channel creation if selected portal is not yet installed

Create modal (2-step, owns its own state machine):
- Step 1 (form-step): name auto-fills from domain on blur, regex validation
  mirroring server, ALREADY_EXISTS / UNAUTHORIZED / FAILED_PRECONDITION
  mapped to inline errors
- Step 2 (authorize-step): opens install URL in new tab, polls portal list
  every 3s, fires onAuthorized when installed=true, soft 5-min timeout
- Help section with copy-handler-URL button and localhost warning

Cleanup:
- Removed public_url field from bitrix24 config schema (auto-captured server-side)
- handleSubmit strips legacy public_url from cleanConfig to avoid re-saving
  dead data from older channel rows

i18n: 50 lines × 3 locales (EN/VI/ZH) covering portalSelect, create modal,
help section, errors.

Plan: plans/260513-1648-bitrix24-portal-self-service-ux/
…403) [B24:2794]

The new bitrix.portals.{list,create,get_install_url,delete} RPC methods
introduced in feat(bitrix24,gateway) had no entry in MethodRole's allowlists.
Policy fail-closes unclassified methods to RoleNone -> security.permission_denied
for every caller including owner role, breaking the new Bitrix24 portal UI.

Classification:
- list, get_install_url -> RoleViewer (any tenant member; needed to populate
  the channel-form dropdown and resume an interrupted authorize)
- create, delete -> RoleAdmin (gates portal credential entry + irreversible
  delete; matches handler-level RBAC check)

Test locks in the matrix to catch future regressions; drift-coverage test
would have caught this had we run it post-implementation.
…UX [B24:2794]

Stale help text referenced the old workflow ("bitrix_portals.name for this
tenant. Authorize the portal at /bitrix24/install first.") which assumed the
admin would CLI-create the row and then visit a URL. Replace with copy that
matches the new in-UI dropdown + Create-new-portal flow.
…4:2794]

When admin deletes a bitrix24 channel_instance (HTTP DELETE or WS RPC), the
Bitrix24 portal-side bot is now imbot.unregister'd before the DB row is
removed. Previously the bot lived on as a zombie — accumulating rubbish on
the portal, breaking re-create flows when bot_code was reused, and causing
silent webhook failures when the gateway public URL had rotated.

Architecture:
- internal/channels/channel.go: ChannelDestroyer optional interface
- internal/channels/bitrix24/register.go: unregisterBot() + isBotNotFoundError()
  (idempotent: treat bot-not-found as success — admin may have manually
  deleted via Bitrix UI)
- internal/channels/bitrix24/portal.go: ForgetRegisteredBot() clears the
  bot_code → bot_id mapping atomically + persists
- internal/channels/bitrix24/channel.go: Destroy() orchestrates unregister
  → forget → Stop. Best-effort: Bitrix-side failures log warn but do not
  block DB delete (a permanently dead portal would otherwise leave rows
  stuck forever).

Handler wiring:
- internal/http/channel_instances.go: handleDelete looks up channel via
  Manager.GetChannel(name); when impl satisfies ChannelDestroyer, calls
  Destroy(ctx) before store.Delete. Setter-pattern for Manager because
  wireHTTP runs before channelMgr is constructed in cmd/gateway.go.
- internal/gateway/methods/channel_instances.go: mirror of HTTP path; uses
  constructor param (channelMgr already in scope at wireChannelRPCMethods).
- cmd/gateway.go: SetChannelManager call after channelMgr creation.
- cmd/gateway_channels_setup.go: pass channelMgr to NewChannelInstancesMethods.

Tests: 14 new sub-cases — unregisterBot happy/bot-not-found/transport/zero-id;
Destroy full-flow/zero-id/unregister-failure-proceeds; ForgetRegisteredBot
success/idempotent/empty-code; isBotNotFoundError 8 variants;
compile-time guard that bitrix24.Channel satisfies ChannelDestroyer.

Plan: plans/260512-1146-bitrix24-bot-unregister-on-delete/
…4:2794]

Addresses Gap C from code review of 2707fbe. When admin disabled a Bitrix24
channel, InstanceLoader.Reload removes it from channels.Manager. A subsequent
delete then hits handleDelete with channelMgr.GetChannel returning false →
the standard ChannelDestroyer block was silently skipped → bot lingered on
the portal as a zombie.

This is common in the new self-service UI flow: admins typically disable a
channel before deleting it. Pre-fix: every disable+delete cycle accumulates
one zombie bot.

Fix: lazy-load Portal directly from the store, look up bot_id via the
persisted RegisteredBots map, call imbot.unregister, then ForgetRegisteredBot.
No Channel object required — bypasses the factory.

Architecture:
- internal/channels/bitrix24/orphan_destroy.go: DestroyOrphanBot exported
  helper; each step idempotent + best-effort.
- internal/http/channel_instances.go + internal/gateway/methods/channel_instances.go:
  registry map keyed by channel_type (OrphanChannelCleaner func), fires when
  GetChannel returns false. Handlers stay agnostic of bitrix24 internals.
- cmd/gateway.go: registers the cleaner closure (captures portalStore +
  encryption key) on both HTTP handler and WS methods after they exist.
- cmd/gateway_channels_setup.go: wireChannelRPCMethods now returns the
  ChannelInstancesMethods instance so the caller can attach the cleaner.

Tests: 7 sub-cases covering no-op + error branches (nil config, nil store,
missing portal/bot_code fields, missing portal row, no bot registered, bad
JSON). Full-flow exercise lives in TestDestroy_FullFlow which shares the
underlying unregisterBot helper.

Plan: plans/260512-1146-bitrix24-bot-unregister-on-delete/ (Gap C)
…24:2794]

For portals installed before Phase 01's auto-capture, state.public_url is
empty. New channels created against such portals fail registerBot with
"public_url not set" because eventHandlerURL has nothing to return. This
one-shot CLI command writes the URL directly.

Usage:
  goclaw bitrix-portal set-public-url \
    --tenant-id <uuid> --name synity-goclaw \
    --url https://goclaw.tamgiac.com

After backfill, subsequent reinstalls overwrite the value automatically
(Phase 01 capture). Command remains useful for ops post-recovery scenarios.
…24:2794]

- Add /bitrix24/handler route for partners.bitrix24.com app registration ping
  and in-portal iframe load
- Add BITRIX24_DEBUG_UNREDACTED_TOKEN env in docker-compose for debug builds
- Log imbot.message.add failures (non-rate-limit) for diagnostics
Some models emit MCP calls as exec with {action:"mcp_xxx", code|command:"..."}.
normalizeToolCall recovers the intended MCP tool name from action and remaps
payload to MCP schema (code) before registry lookup.
Previously tool_choice was hardcoded to "auto" whenever tools were present.
Honor caller-supplied OptToolChoice via ChatRequest.Options for finer control
(e.g. "required", "none", or specific tool selection).
…24:2794]

Bitrix24 Vietnam cloud uses .bitrix24.vn but regex only allowed
.com|eu|ru|de|fr|jp|in|kz|ua|by + .bitrix.info. Vietnamese portals
(e.g. demo-synity.bitrix24.vn) were rejected at form validation
before backend was even hit. Add 'vn' to both frontend and backend
regex + test cases.
@nguyennguyenit
Copy link
Copy Markdown
Contributor

nguyennguyenit commented May 16, 2026

Thanks @tech-synity for the careful 3-way split and dogfooding on tamgiac.bitrix24.com.

Maintainer decision after validating the PR stack: Bitrix24 must land as a normal GoClaw channel, not as a new portal subsystem.

Required Direction

Rules for this refactor:

  • Treat Bitrix24 as one channel like Telegram / Discord / Slack / Zalo / Feishu / WhatsApp.
  • When removing previously added migration/schema work, also roll back the corresponding dev/test DB schema version so future migrations are not skipped or blocked by version drift.
  • Do not add or modify SQL schema for Bitrix24.
  • Do not add Bitrix24-specific store interfaces, PG stores, SQLite stores, schema patches, or migration files.
  • Do not keep the bitrix_portals abstraction.
  • Use channel_instances as the source of truth for Bitrix24 setup and runtime state.
  • Keep secrets/tokens out of plaintext channel_instances.config. In current code, config is JSONB/plain JSON; use encrypted channel_instances.credentials for client_id, client_secret, access_token, refresh_token, expires_at, app_token, etc.
  • Keep non-secret channel options in channel_instances.config: domain, bot_code, bot_name, bot_type, policies, rendering options, public URL / handler URL metadata.
  • Reuse existing mcp_user_credentials for Bitrix24 per-user MCP credentials.
  • Avoid generic “Path B” framing. Describe it concretely as: Bitrix24 OAuth → existing mcp_user_credentials bridge.

PR-by-PR Action

#1059

Mostly OK as infra. One issue to fix before merge:

  • Restore Discord group mention gating for pairing replies. Current diff sends pairing replies in group chats before checking whether the bot was mentioned. With RequireMention=true default, unmentioned group messages should be recorded/silent, not trigger pairing replies.

#1060

Do not merge in current shape.

Please remove this PR’s Bitrix-specific persistence layer entirely:

  • Remove bitrix_portals table and migration 000058.
  • Remove SQLite schema bump / schema patch.
  • Remove BitrixPortalStore interface.
  • Remove PG / SQLite Bitrix portal store implementations.
  • Remove Stores.BitrixPortals wiring.
  • Remove goclaw bitrix-portal ... CLI.

Migration/version hygiene while fixing:

  • If any dev/test Postgres DB has already applied 000058, roll it back before removing the migration file. Do not leave schema_migrations.version = 58 after the codebase drops 000058, or the next real migration/version can be skipped or blocked.
  • If any local SQLite/Lite DB has already moved to schema v27 from this branch, reset/recreate that local DB or bring schema_version back to the current mainline value after removing the bitrix_portals table. Do not leave local fixtures at v27 while SchemaVersion returns to v26.
  • The final branch should restore the mainline schema versions: PG RequiredSchemaVersion back to the current value, SQLite SchemaVersion back to the current value, and no bitrix_portals schema state.

After this change, #1060 may become empty. That is fine. The correct target state is zero new tables, zero migrations.

#1061

Refactor the channel implementation to use existing channel-instance storage:

  • Bitrix24 channel factory should read from channel_instances.credentials and channel_instances.config, like the other channel factories.
  • OAuth install / refresh state should update the Bitrix24 channel instance credentials, not a portal row.
  • UI should configure a Bitrix24 channel instance directly. No portal dropdown, no portal create modal, no bitrix.portals.* RPC.
  • If token refresh needs serialization, keep it within the channel runtime for this phase. Do not introduce advisory-lock SQL while the rule is “no SQL changes for Bitrix24”.
  • Keep resolveActorUserID in internal/agent if needed; that helper is channel-agnostic. But Bitrix-specific OAuth-to-MCP plumbing should stay in internal/channels/bitrix24/.
  • Update PR descriptions and docs to remove portal/store/migration wording and state explicitly: Bitrix24 is implemented as a regular channel using existing channel instance storage.

Why

Current codebase pattern is that channels store their config/state through existing channel tables. Adding the first channel-specific table (bitrix_portals) creates a precedent that would scale badly for future OAuth-based channels.

The channel logic, group semantics, formatting, and webhook verification can still be reused. The requested change is the persistence and architecture shape, not a rejection of the Bitrix24 channel work.

@nguyennguyenit
Copy link
Copy Markdown
Contributor

nguyennguyenit commented May 16, 2026

Bổ sung thêm rule để tránh hiểu lệch scope refactor:

  • Bitrix24 chỉ nên được xem là một channel bình thường của GoClaw, giống Telegram/Discord/Slack/Zalo/Feishu/WhatsApp.
  • Implementation không thêm hoặc sửa SQL schema: không table riêng, không migration PG/SQLite, không store interface riêng cho Bitrix24.
  • Không dùng bitrix_portals abstraction. Channel instance phải là đơn vị cấu hình/vận hành chính.
  • Secret/token không nên lưu trong channel_instances.config nếu field đó không được encrypt. Với code hiện tại, OAuth secret/token nên nằm trong channel_instances.credentials; config chỉ giữ non-secret như domain, bot_code, bot_name, policy, rendering settings.
  • Có thể tham khảo cách các channel khác implement factory/config/credentials từ channel_instances, đặc biệt Telegram/Discord/Slack/Zalo.
  • MCP integration nên được mô tả cụ thể là “Bitrix24 OAuth → existing mcp_user_credentials bridge”, không framing thành architecture generic như “Path B”.
  • resolveActorUserID có thể giữ ở internal/agent vì là helper channel-agnostic, nhưng Bitrix-specific OAuth-to-MCP plumbing nên nằm trong internal/channels/bitrix24.
  • Khi gỡ migration/schema đã thêm, phải rollback version trong DB dev/test tương ứng để tránh lệch version làm các migration sau không chạy được.

Mục tiêu cuối: PR này land như một channel implementation thuần, không mở precedent channel-specific schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants