Skip to content

discovers: replace invalid collection keys with discovered fallback keys#2808

Open
jshearer wants to merge 1 commit intomasterfrom
agent/apply_fallback_key_when_collection_loses_its_own_key
Open

discovers: replace invalid collection keys with discovered fallback keys#2808
jshearer wants to merge 1 commit intomasterfrom
agent/apply_fallback_key_when_collection_loses_its_own_key

Conversation

@jshearer
Copy link
Contributor

@jshearer jshearer commented Mar 24, 2026

Context

When a user removes a primary key column from a captured table, the connector's discover response returns the updated schema (without that column) and optionally a set of fields which together uniquely identify a row, marked as a fallback key. The discover merge logic is designed to skip key updates for fallback keys, preserving any user-customized key, but it still updates the schema of the collection. The result is a collection whose key references field(s) in the schema that no longer exist, which causes newly captured documents to fail validation due to:

location /key is unknown in schema schema://bundle

The connector behavior is correct: a secondary unique index is not the primary key, so marking it as a fallback is accurate. The merge logic just didn't account for the case where the existing key becomes invalid after a schema update, and we have a viable alternative in the form of the fallback key.

What changed

merge_collections now validates the existing key against the discovered schema. When a fallback key is discovered and the existing key no longer resolves (its fields aren't in the schema), the fallback replaces it. Two guards prevent over-application:

  1. The existing key must actually be invalid in the discovered schema. If the old column still exists but just isn't the primary key anymore, the existing key is preserved.
  2. The discovered fallback key must itself resolve. This prevents replacing one broken key with another.

The key validity check uses doc::Shape::infer + shape.locate() to determine whether each pointer resolves to an explicit schema location (Exists::Must or Exists::May) vs an implicit or impossible one.

When a fallback key replacement occurs, a reason string is attached to the Changed struct and included in the publication detail, producing output like:

auto-discover changes (0 added, 1 modified, 0 removed)
- acmeCo/some_table: replaced collection key ["/id"] with fallback ["/foo_id", "/bar_id"] because the existing key no longer exists in the discovered schema

Question: Should this be further constrained to Exists::Must? I imagine that we don't currently have any constraint that fields included in a fallback key be required fields, so many of them likely are Exists::May. What... happens when a key is optional? Would that even work?

If a collection's existing key no longer exists in the discovered schema, and the discovered fallback key does, replace the collection's key to use the discovered fallback key. If both are invalid, or the existing key still resolves, do nothing.

Also update `Changed` to carry an optional `reason` field that surfaces in the publication detail when a fallback key replacement occurs.
@jshearer jshearer force-pushed the agent/apply_fallback_key_when_collection_loses_its_own_key branch from 7790f7d to 9664bbf Compare March 24, 2026 21:53
@jshearer jshearer marked this pull request as ready for review March 24, 2026 21:53
@jshearer jshearer requested a review from a team March 24, 2026 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant