Skip to content

feat(security): security audit trail, API-key alerts, and workflow version history#1463

Draft
joelorzet wants to merge 16 commits into
stagingfrom
feat/keep-671-api-token-email-alerts
Draft

feat(security): security audit trail, API-key alerts, and workflow version history#1463
joelorzet wants to merge 16 commits into
stagingfrom
feat/keep-671-api-token-email-alerts

Conversation

@joelorzet
Copy link
Copy Markdown

Overview

Hardening work for workflows safety & security. Ships an out-of-band alert + a durable, queryable audit trail for sensitive actions, attribution on executions, and end-to-end workflow versioning with an in-editor history UI. Draft while the remaining ticket items (org limits) and the open questions below are decided.

What's included

Security audit registry

  • New security_audit_log table + recordAuditEvent() (actor, org, action, resource, deep-diff of before/after, ip/UA), with composite indexes for org / actor / resource / action timelines.
  • GET /api/security/audit — org-scoped, owner/admin-gated, filterable, cursor-paginated, actor name/email enriched.

Out-of-band email alerts

  • API key create/revoke (user wfb_ and org kh_) send an owner notice and write an audit event.

Sensitive actions audited

  • Auth/credentials: password change, password reset, email change, account deactivate, session revoke.
  • MFA: TOTP enroll/disable, backup-code regeneration.
  • Workflows: create / update / delete, marketplace list / unlist / listing update.
  • Billing: subscription.plan_changed / subscription.canceled recorded authoritatively in the Stripe webhook handler; checkout/cancel routes record the user intent.
  • Wallets: org-wallet creation, agentic-wallet HMAC rotation.

Execution attribution

  • workflow_executions now records a durable credential descriptor (type + non-secret label) that survives key revocation, plus executed_workflow_hash tying a run to the exact definition that produced it.

Workflow versioning + history (admin/owner)

  • workflow_history table: one row per version (full snapshot + semantic change + content hash + actor), written on create/update.
  • GET /api/workflows/[id]/history and GET /api/workflows/[id]?version=N.
  • In-editor "Version history" overlay: timeline with author/relative time, a readable semantic change list (added/removed/changed nodes with type and before/after; connections), "View on canvas" read-only preview (autosave suppressed during preview), and Restore.
  • API-key section shows a section-level create/revoke activity log.

Incidental fixes (pre-existing autosave 422 surfaced during testing)

  • Accept legacy args key and editor-persisted abiFunctionKey for web3 read/write-contract validation (non-breaking, validation-only).

Migrations

  • 0098 security_audit_log, 0099 execution credential + version columns, 0100 workflow_history. Hand-authored (drizzle-kit generate is blocked by a pre-existing snapshot-chain collision); each verified against a local Postgres in a rolled-back transaction.

Verification

  • pnpm type-check clean; lint clean on changed files; unit/integration tests for the audit helper, content hash, version diff, attribution, notifications, and billing cancel route pass.

Open / deliberately deferred

  • Org limits (the other ticket item) not started.
  • Failed/blocked-attempt logging (only success paths audited today).
  • Retention/pruning for security_audit_log and workflow_history.
  • Route-level tests for the audit wiring; no automated UI test for the history overlay.
  • Autosave currently blocks on incomplete action configs; decoupling draft-save from strict validation (enforce on enable/run/list) is pending a decision.

joelorzet added 16 commits June 4, 2026 07:24
deep-diff backs the security audit log's before/after change records. Its
npm "latest" tag is broken and the only release (1.0.2, 2018) trips the
repo minimum-release-age gate, so it is added to the .npmrc exclude list
alongside the existing pinned-legacy packages.
Introduce a durable, queryable record of sensitive account actions and
wire API-key create/revoke into it, alongside an out-of-band email alert.

- security_audit_log table storing actor, org, action, resource, a
  deep-diff of before/after state, and request metadata (ip, country,
  user agent); composite indexes for org, actor, resource, and action
  timelines, each trailing created_at for filter-then-newest queries
- recordAuditEvent() helper that computes the diff and writes the row
  best-effort, so a logging failure never breaks the user action
- sendApiKeyChangeEmail() out-of-band notice on key create/revoke
- POST/DELETE api-keys routes emit both the email and an audit event
- migration 0098 (hand-authored; drizzle-kit generate is blocked by a
  pre-existing snapshot-chain collision)
Make the execution audit trail durable and reconstructable.

- triggered_by_credential_type / triggered_by_credential_label on
  workflow_executions capture which credential triggered a run
  (webhook_key | org_api_key | oauth | session | internal, plus a
  non-secret handle). These survive key revocation, unlike the existing
  triggered_by_*_api_key_id FKs which are nulled when a key is deleted
- executed_workflow_hash stamps the sha256 of the nodes+edges that ran,
  tying a run to the exact definition that produced it and joining to
  workflow_history.content_hash to resolve the stored snapshot
- hashWorkflowDefinition() shared content-hash helper
- buildAttribution() extended; execute and webhook routes populate the
  new fields
- migration 0099 (hand-authored)
Bring org-scoped (kh_) API keys to parity with user webhook keys: both
the create and revoke paths now send the out-of-band email alert and
write a security audit event with org context, so every long-lived
credential mint/revoke is recorded the same way regardless of scope.
GET /api/security/audit returns the active org's audit trail. Sensitive
forensic data, so it is session-gated and restricted to org owners and
admins, and always scoped to the caller's organization. Filterable by
action, resource, and actor with a created_at cursor for pagination, all
served by the composite indexes on security_audit_log.
Wire the account-takeover-relevant actions into the security audit log so
the trail covers more than API keys:

- password change and password reset (reset records the requesting IP)
- email change (captures the before/after address)
- account deactivation
- session revocation
- TOTP enroll/disable and backup-code regeneration

Each writes a best-effort audit event with actor, resource, and request
metadata at its success point.
Wire workflow lifecycle and Marketplace listing mutations into the
security audit log:

- workflow.created / workflow.updated / workflow.deleted
- workflow.listed / workflow.unlisted / workflow.listing_updated

workflow.updated records scalar fields plus a content hash of the
definition rather than the full nodes/edges, keeping the audit row small;
the full snapshot and structural diff remain the job of the workflow
change-history table.
Audit billing in two layers, matching where the state actually lives:

- Authoritative transitions are recorded in the Stripe webhook handler
  (handle-billing-event.ts), the source of truth: subscription.plan_changed
  on a price change and subscription.canceled on deletion. Actor is the
  provider webhook (system).
- The checkout and cancel routes record the user-initiated intent
  (subscription.change_requested / subscription.cancel_requested) so the
  trail keeps which user triggered it, which the webhook does not carry.
- org_wallet.created on Turnkey org-wallet provisioning (actor = the
  creating user)
- agentic_wallet.hmac_rotated on HMAC secret rotation, recording the
  key-version bump; actor is the wallet sub-org (HMAC-authenticated)
Add the workflow_history store powering change history, version load, and
restore.

- workflow_history table: one row per version with the full snapshot
  (incl. edges, which are structural), a deep-diff vs the previous version,
  a content hash, and the same actor capture (who/when) as the audit log.
  Per-workflow version counter (unique with workflow_id).
- recordWorkflowSnapshot() helper, best-effort like recordAuditEvent, hooked
  into the workflow create and update chokepoints.
- content-hash + diff now normalize the definition: node identity/type/data
  and edge connectivity are tracked, cosmetic ReactFlow state (position,
  selection, size, edge styling) is stripped, so dragging a node does not
  create a version but a connection or config change does.
- migration 0100 (hand-authored).

Listing-only metadata changes stay audit-log-only (they don't alter the
definition).
- GET /api/workflows/[id]/history: admin/owner-gated version timeline with
  per-version diff and actor name/email enrichment, cursor-paginated.
- GET /api/workflows/[id]?version=N: returns a historical snapshot in the
  same shape as the live row, so the editor can load a past version.
- Enrich the security audit read endpoint with actor name/email too.
- Shared lib/security/org-role.ts (getOrgRole / isOrgAdmin) gating these
  reads to organization owners and admins.

Restore is performed client-side by loading a version and saving it back
through the normal update path, which reuses all existing validation,
schedule sync, history, and audit wiring rather than duplicating it.
Surface workflow versions in context, in the editor:

- A History button in the workflow toolbar, shown only to org admins/owners
  (useActiveMember), opens a version-history overlay.
- The overlay lists versions (who/when/source) and shows a read-only Monaco
  side-by-side JSON diff of the selected version against its predecessor.
- Restore writes the chosen snapshot back through the normal save path
  (creating a new version + audit event) and syncs the live canvas.
- api-client: getById(id, { version }) and getHistory(id); CodeDiffEditor
  wraps Monaco's DiffEditor with the shared theme.

On-canvas read-only preview of a past version is deferred; the diff view
already shows what changed.
On the Organisation tab, each org API key gets a History toggle (admin/owner
only) that lists its create/revoke events with actor and timestamp, read
from the org-scoped security audit trail via api.security.getAudit. Reuses
the existing audit endpoint and actor-name enrichment -- no new backend.

User (wfb_) webhook keys are personal and their audit events carry no org,
so the org-scoped reader does not surface them; personal-key history is a
separate follow-up.
Replace the raw Monaco JSON diff (which surfaced noisy node-position changes
and was hard to read) with a human semantic diff and a clearer UX.

- computeVersionDiff(): compares snapshots by node id and edge connectivity,
  ignoring cosmetic canvas state (position, selection, edge styling). Reports
  added/removed/changed nodes (with node type and field-level before/after,
  e.g. renamed "A" to "B", config keys changed) and added/removed
  connections. Drops the Monaco diff editor entirely.
- Version-history overlay redesigned: timeline with author + relative time +
  Current badge; change list uses Plus/Minus/Pencil and ArrowRight icons
  (no glyph arrows) instead of JSON.
- "View on canvas": load a version read-only via a new previewVersionAtom
  that suppresses autosave (so previewing can't clobber the live workflow);
  a banner offers Restore / Exit preview. The atom is reset on editor
  mount/unmount.
- API-key history moved from a per-item expander (a key is only ever
  created/revoked) to a section-level activity log on the Organisation tab,
  capturing create + revoke across all keys, including revoked ones.

Adds version-diff unit tests.
Workflows that persisted the read/write-contract function arguments under
`args` (the canonical field is `functionArgs`) failed action-config
validation with INVALID_ACTION_CONFIG, so autosave rejected every save --
even a fully configured node or a layout-only change. Add `args ->
functionArgs` to LEGACY_FIELD_ALIASES, matching the existing
`functionName -> abiFunction` alias. Validation-only and non-breaking; the
runtime already reads functionArgs.
read/write-contract nodes persist abiFunctionKey (the resolved function
signature used for overloaded-function disambiguation), but it is not a
declared config field, so strict action-config validation rejected the save
with INVALID_ACTION_CONFIG -- autosave failed on every read-contract node.
The runtime recomputes the key and never reads it from config, so add it to
LEGACY_IGNORED_FIELDS for read/write-contract. Validation-only, non-breaking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant