feat(codecs): fetch Avro schema dynamically from Confluent Schema Registry by omwbennett · Pull Request #25093 · vectordotdev/vector

omwbennett · 2026-04-01T14:04:42Z

Summary

Adds support for fetching Avro schemas dynamically from a Confluent Schema Registry, closing #19546.

Previously, the Avro decoder required an inline schema to be provided in the Vector config. Schemas are often stored in a registry so users have to fetch schemas when generating the config, either manually or using custom config generation tooling. This also means Vector is unable to handle compatible schema evolutions where a topic may contain messages serialized with different schema versions.

With this change, a schema_registry block can be specified instead of an inline schema which tells the decoder to fetch the correct schema using the schema ID embedded in each message (according to the Confluent wire-format).

Schemas are fetched on first use and cached locally by schema ID, so subsequent messages with the same schema ID incur no network overhead.

Vector configuration

An example config:

  sources:
    kafka_in:
      type: kafka
      bootstrap_servers: localhost:9092
      topics:
        - my-topic
      group_id: vector
      decoding:
        codec: avro
        avro:
          schema_registry:
            url: http://schema-registry:8081

How did you test this PR?

Unit tests using wiremock to mock the schema registry to verify schemas are fetched on the initial message and cached for subsequent messages.
Manual testing with a local Confluent Schema Registry container, with 50 topics each registered with a unique schema

Change Type

Is this a breaking change?

Yes
No

Does this PR include user facing changes?

Yes. Please add a changelog fragment based on our guidelines.
No. A maintainer will apply the no-changelog label to this PR.

References

Closes: #19546

Notes

The schema and schema_registry fields are mutually exclusive. Specifying both will return an error at startup. When schema_registry is configured, the strip_schema_id_prefix option is ignored as the wire-format prefix is always consumed to extract the schema ID.

Schema fetches are performed synchronously inside Deserializer::parse() using tokio::task::block_in_place which isn't ideal as many threads could potentially block up to the request timeout if the schema registry was unavailable. Fixing this is a bit trickier as parse() is not async so it would require a larger refactor.
The impact seems minimal as schemas are fetched at most once per unique schema ID and cached indefinitely, so blocking only occurs on the first message for each schema. max_blocking_threads is also set quite high (20k) so we shouldn't end up blocking all threads.

…istry

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05bb92c5b1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-01T14:12:04Z

lib/codecs/src/decoding/format/avro.rs

+        tokio::task::block_in_place(|| {
+            tokio::runtime::Handle::current().block_on(async {
+                let resp = http_client


Avoid panic on single-thread Tokio runtimes

Calling tokio::task::block_in_place here will panic when the decoder runs on a current_thread runtime, and the nested Handle::current() call also panics if parsing happens outside Tokio altogether. This means schema-registry decoding can crash the process (instead of returning an error) on the first uncached schema ID in contexts that do use current-thread runtimes in this repo (for example component-validation runners), so schema fetch should avoid block_in_place/Handle::current assumptions and use a non-panicking path.

Useful? React with 👍 / 👎.

feat(codecs): fetch Avro schema dynamically from Confluent Schema Reg…

7c450e1

…istry

omwbennett requested review from a team as code owners April 1, 2026 14:04

github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Apr 1, 2026

Merge branch 'master' into get-avro-schema-from-registry

05bb92c

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

jeff-morgan-dd self-assigned this Apr 1, 2026

jeff-morgan-dd approved these changes Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(codecs): fetch Avro schema dynamically from Confluent Schema Registry#25093

feat(codecs): fetch Avro schema dynamically from Confluent Schema Registry#25093
omwbennett wants to merge 2 commits intovectordotdev:masterfrom
omwbennett:get-avro-schema-from-registry

omwbennett commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

omwbennett commented Apr 1, 2026

Summary

Vector configuration

How did you test this PR?

Change Type

Is this a breaking change?

Does this PR include user facing changes?

References

Notes

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants