Skip to content

feat(codecs): fetch Avro schema dynamically from Confluent Schema Registry#25093

Open
omwbennett wants to merge 2 commits intovectordotdev:masterfrom
omwbennett:get-avro-schema-from-registry
Open

feat(codecs): fetch Avro schema dynamically from Confluent Schema Registry#25093
omwbennett wants to merge 2 commits intovectordotdev:masterfrom
omwbennett:get-avro-schema-from-registry

Conversation

@omwbennett
Copy link
Copy Markdown

Summary

Adds support for fetching Avro schemas dynamically from a Confluent Schema Registry, closing #19546.

Previously, the Avro decoder required an inline schema to be provided in the Vector config. Schemas are often stored in a registry so users have to fetch schemas when generating the config, either manually or using custom config generation tooling. This also means Vector is unable to handle compatible schema evolutions where a topic may contain messages serialized with different schema versions.

With this change, a schema_registry block can be specified instead of an inline schema which tells the decoder to fetch the correct schema using the schema ID embedded in each message (according to the Confluent wire-format).

Schemas are fetched on first use and cached locally by schema ID, so subsequent messages with the same schema ID incur no network overhead.

Vector configuration

An example config:

  sources:
    kafka_in:
      type: kafka
      bootstrap_servers: localhost:9092
      topics:
        - my-topic
      group_id: vector
      decoding:
        codec: avro
        avro:
          schema_registry:
            url: http://schema-registry:8081

How did you test this PR?

  • Unit tests using wiremock to mock the schema registry to verify schemas are fetched on the initial message and cached for subsequent messages.
  • Manual testing with a local Confluent Schema Registry container, with 50 topics each registered with a unique schema

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Closes: #19546

Notes

The schema and schema_registry fields are mutually exclusive. Specifying both will return an error at startup. When schema_registry is configured, the strip_schema_id_prefix option is ignored as the wire-format prefix is always consumed to extract the schema ID.

Schema fetches are performed synchronously inside Deserializer::parse() using tokio::task::block_in_place which isn't ideal as many threads could potentially block up to the request timeout if the schema registry was unavailable. Fixing this is a bit trickier as parse() is not async so it would require a larger refactor.
The impact seems minimal as schemas are fetched at most once per unique schema ID and cached indefinitely, so blocking only occurs on the first message for each schema. max_blocking_threads is also set quite high (20k) so we shouldn't end up blocking all threads.

@omwbennett omwbennett requested review from a team as code owners April 1, 2026 14:04
@github-actions github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Apr 1, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05bb92c5b1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +195 to +197
tokio::task::block_in_place(|| {
tokio::runtime::Handle::current().block_on(async {
let resp = http_client
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid panic on single-thread Tokio runtimes

Calling tokio::task::block_in_place here will panic when the decoder runs on a current_thread runtime, and the nested Handle::current() call also panics if parsing happens outside Tokio altogether. This means schema-registry decoding can crash the process (instead of returning an error) on the first uncached schema ID in contexts that do use current-thread runtimes in this repo (for example component-validation runners), so schema fetch should avoid block_in_place/Handle::current assumptions and use a non-panicking path.

Useful? React with 👍 / 👎.

@jeff-morgan-dd jeff-morgan-dd self-assigned this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: external docs Anything related to Vector's external, public documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Get avro schema from registry

2 participants