Skip to content

fix(kernel): lenient binding parsing — one typo no longer drops the entire bindings table#1146

Open
benhoverter wants to merge 1 commit intoRightNow-AI:mainfrom
benhoverter:lenient-binding-parse
Open

fix(kernel): lenient binding parsing — one typo no longer drops the entire bindings table#1146
benhoverter wants to merge 1 commit intoRightNow-AI:mainfrom
benhoverter:lenient-binding-parse

Conversation

@benhoverter
Copy link
Copy Markdown
Contributor

Summary

Fixes #1145.

A single misspelled field anywhere in [[bindings]] currently causes the entire bindings table to fail parsing at config load. Because KernelConfig deliberately does not use deny_unknown_fields at the top level (forward-compat with new sections), the failure surfaces as a silent default: every agent ends up unbound, the daemon boots clean, and routing simply stops working — with no log line pointing at the typo.

This PR makes binding parsing partial-success: each entry is parsed independently, malformed entries are dropped with a clear ERROR naming the agent and the bad field, and survivors load normally. Per-entry deny_unknown_fields is preserved, so silent typos still fail loudly per entry — just no longer catastrophically across the whole table.

Changes

  • New lenient_extract_bindings(&toml::Value) in openfang-kernel/src/config.rs runs after include-merge / [api] migration, before the main try_into::<KernelConfig>() call. Each [[bindings]] entry is try_into::<AgentBinding>()'d on its own; failures log an ERROR with index, best-effort-extracted agent name, and the underlying serde error message (which already lists valid field names). A single aggregate WARN reports dropped vs. surviving counts. Survivors are reinjected into the config value before the normal deserialize proceeds.
  • No changes to AgentBinding or BindingMatchRule schemasdeny_unknown_fields is preserved on both. Behavior change is scoped strictly to the failure mode of the bindings array.
  • TODO marker added on the remaining warn!-on-load fallback in load_config for the non-binding silent-default path (follow-up work).

Out of scope: the broader silent-default path for non-binding config sections; surfacing the dropped-binding count in any non-log channel.

Testing

  • cargo clippy --workspace --all-targets -- -D warnings passes
  • cargo test --workspace passes (+7 new tests, all green)
  • Live integration tested

Live smoke test on a daemon built from this branch stacked on a hardened base: typo'd a binding field (cccchannel) in config.toml and bounced the daemon. Log output:

ERROR Skipping malformed binding #3 (agent='assistant-teo'):
      unknown field `cccchannel`, expected one of
      `channel`, `account_id`, `peer_id`, `guild_id`, `channel_id`, `roles`
WARN  Dropped 1 malformed binding(s); 11 binding(s) will load.
INFO  Loaded agent bindings into router count=11

The error message names the agent, the typo'd field, and the valid alternatives. Surviving bindings loaded and routed correctly.

7 new unit tests cover:

  • test_lenient_bindings_drops_typo_keeps_rest — the issue reproducer.
  • test_lenient_bindings_all_valid_unchanged — happy path: no behavioral change for clean configs.
  • test_lenient_bindings_all_malformed_yields_empty_but_keeps_rest_of_config — every binding malformed → empty bindings, but the rest of the config still loads.
  • test_lenient_bindings_no_bindings_section_is_noop — configs without [[bindings]] are untouched.
  • test_lenient_bindings_missing_agent_field_dropped — entry without an agent field is dropped (cannot be routed).
  • test_lenient_bindings_preserves_survivor_order — locks in that first-match-wins routing semantics cannot silently regress when a middle entry is dropped.
  • test_lenient_bindings_top_level_field_typo_dropped — locks in that deny_unknown_fields catches operator typos at the binding top level (e.g. agnt = ...), not just inside match_rule.

CI note: the same pre-existing cargo fmt --all -- --check failure on main (tracked by #1121, inherited from da6b567a and earlier) applies here. None of the failing fmt lines are in crates/openfang-kernel/src/config.rs — the only file this PR modifies. Happy to address as a separate fmt-only PR if maintainers prefer.

Security

  • No new unsafe code
  • No secrets or API keys in diff
  • User input validated at boundaries
  • No personal agent names in diff, description, tests, or fixtures

Note on relationship to #1144

This PR pairs naturally with #1144 (channel_id binding hardening). Together they make binding misconfig caught early (#1144) and survivable (this PR) instead of silent-and-catastrophic. The two PRs touch disjoint code paths and can land in either order.

A typo in any binding's match_rule no longer drops the entire bindings
table. Each entry is parsed independently; malformed entries log an
ERROR with index, agent name, and the underlying serde error, then are
skipped. A single WARN summarizes total dropped vs. surviving bindings.
Per-entry deny_unknown_fields is preserved so silent typos still fail
loudly — just no longer catastrophically.

Before this change, a single misspelled field anywhere in [[bindings]]
caused the whole table to fail parsing, silently unbinding every
agent — the worst possible failure mode for a routing config.

- New `lenient_extract_bindings` runs after include-merge / [api]
  migration, before `try_into::<KernelConfig>()`.
- 7 new config tests cover the reproducer, happy path, all-malformed,
  no-bindings, missing-agent, survivor-order preservation, and
  top-level field typos:
    * test_lenient_bindings_drops_typo_keeps_rest
    * test_lenient_bindings_all_valid_unchanged
    * test_lenient_bindings_all_malformed_yields_empty_but_keeps_rest_of_config
    * test_lenient_bindings_no_bindings_section_is_noop
    * test_lenient_bindings_missing_agent_field_dropped
    * test_lenient_bindings_preserves_survivor_order — locks in that
      first-match-wins routing semantics cannot silently regress when
      a middle entry is dropped
    * test_lenient_bindings_top_level_field_typo_dropped — locks in
      that deny_unknown_fields catches operator typos at the binding
      top level (e.g. \`agnt = ...\`), not just inside match_rule
- TODO marker added on the remaining \`warn!\` fallback in \`load_config\`
  for the non-binding silent-default path (follow-up work).

Tested live: typo'd \`hannel\` field on binding #2 logged as expected;
remaining 5 bindings loaded and routed correctly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kernel: lenient binding parsing — one typo no longer drops the entire bindings table

1 participant