fix(file source): high CPU usage after async file server migration by fcfangcc · Pull Request #25064 · vectordotdev/vector

fcfangcc · 2026-03-30T08:56:57Z

Summary

This PR addresses a CPU regression introduced by the async file source changes, #24058 .

The async migration converted FileWatcher's reader from synchronous std::io::BufRead to tokio::io::AsyncBufRead. The critical behavioral difference is that tokio::io::BufReader::fill_buf().await returns immediately with an empty buffer when the underlying file is at EOF — it is a non-blocking, zero-cost poll that completes in the same tick.

Previously, the file server's main loop ran inside spawn_blocking and relied on a global backoff (sleep up to 2048ms when no bytes were read globally). While the per-file fill_buf() was also non-blocking at EOF in the sync version, the overall loop cadence was naturally throttled by the blocking thread context and the global sleep.

After the async conversion, the main loop runs as a normal async task. On each iteration it calls should_read() → read_line() → fill_buf().await for every watched file. For idle files at EOF, each read_line() call completes almost instantly but still incurs the overhead of the async state machine, buffer checks, and the function call chain (read_line → read_until_with_max_size → fill_buf). With hundreds of idle files, this tight loop burns significant CPU doing no useful work.

The global backoff (backoff_cap, max 2048ms) only kicks in after iterating through all watchers, so it cannot prevent the per-file polling overhead within each loop iteration.

What Changed

1. Add per-watcher EOF backoff

FileWatcher now backs off after repeated EOF reads instead of polling at the same rate while the file remains idle.

The backoff grows for repeated EOF probes.
The backoff resets immediately after a successful read.
Active files keep their previous responsiveness.

This reduces unnecessary wakeups and polling work when a small number of files remain active and many others have already reached EOF.

2. Remove per-read boxing from the shared async buffer read path

The shared read_until_with_max_size helper now takes a borrowed reader directly instead of wrapping the reader for each call.

This keeps the outer reader abstraction intact, but removes extra work from the line-read hot path.

3. Add a benchmark for the regression scenario

A new benchmark was added for one active file together with many idle watched files. This is the workload shape that exposed the regression.

The benchmark lives in benches/files.rs under the files/idle_watchers group and measures:

0 idle watched files
128 idle watched files
512 idle watched files

Vector configuration

How did you test this PR?

Change Type

Is this a breaking change?

Yes
No

Does this PR include user facing changes?

Yes. Please add a changelog fragment based on our guidelines.
No. A maintainer will apply the no-changelog label to this PR.

References

Notes

Please read our Vector contributor resources.
Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
Some CI checks run only after we manually approve them.
- We recommend adding a pre-push hook, please see this template.
- Alternatively, we recommend running the following locally before pushing to the remote branch:
  - make fmt
  - make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
  - make test
After a review is requested, please avoid force pushes to help us review incrementally.
- Feel free to push as many commits as you want. They will be squashed into one before merging.
- For example, you can run git merge origin master and git push.
If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

github-actions · 2026-03-30T08:57:12Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

fcfangcc · 2026-03-31T02:21:54Z

I have read the CLA Document and I hereby sign the CLA

bruceg · 2026-03-31T03:02:48Z

With hundreds of idle files, this tight loop burns significant CPU doing no useful work.

The file regression tests only manages active files. I wonder if there would be a way to inject some static files into the test directory to replicate this. It would be nice to be able to demonstrate this is resolved in a test that is run regularly, as the regression tests are, rather than the benchmarks which aren't.

fcfangcc · 2026-03-31T03:35:17Z

@bruceg It’s somewhat challenging because it doesn’t actually slow down reading (assuming sufficient resources)—it merely consumes more resources. I don’t currently have a good idea for this.

Regression tests aren’t designed to measure performance—they can, at best, verify whether the backoff mechanism is triggered.

pront · 2026-03-31T16:49:13Z

@codex review

chatgpt-codex-connector · 2026-03-31T16:56:49Z

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

thomasqueirozb

Very nice fix, thanks!

Note that this technically considered user facing since it affects Vector behavior so we add a changelog for these types of fixes. Will merge once the changelog is added

thomasqueirozb · 2026-03-31T19:29:28Z

lib/file-source/src/file_watcher/mod.rs

+const EOF_READ_BACKOFF_MIN: Duration = Duration::from_millis(1);
+const EOF_READ_BACKOFF_MAX: Duration = Duration::from_millis(250);


These look like sensible defaults, we can tweak them later if needed

lib/file-source-common/src/buffer.rs

fcfangcc · 2026-04-01T01:41:49Z

@thomasqueirozb changelog added.

fcfangcc requested a review from a team as a code owner March 30, 2026 08:56

fcfangcc changed the title ~~fix(file source): back off idle EOF polling and remove per-read boxing~~ fix(file source): high CPU usage after async file server migration Mar 31, 2026

This comment was marked as spam.

Sign in to view

fcfangcc force-pushed the master-fcfangcc branch from b01a0e6 to 1621953 Compare March 31, 2026 02:14

github-actions bot added the domain: ci Anything related to Vector's CI environment label Mar 31, 2026

fcfangcc force-pushed the master-fcfangcc branch from 1621953 to 2fc6117 Compare March 31, 2026 02:17

github-actions bot removed the domain: ci Anything related to Vector's CI environment label Mar 31, 2026

fcfangcc force-pushed the master-fcfangcc branch from 2fc6117 to 250d6bd Compare March 31, 2026 02:23

thomasqueirozb approved these changes Mar 31, 2026

View reviewed changes

fcfangcc added 2 commits April 1, 2026 09:38

fixed reached_eof

4ff841e

add changelog

a3e9c20

fcfangcc force-pushed the master-fcfangcc branch from 1a75843 to a3e9c20 Compare April 1, 2026 01:38

pront mentioned this pull request Apr 1, 2026

fix(buffers): Reverse performance regression in buffer metrics #24995

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(file source): high CPU usage after async file server migration#25064

fix(file source): high CPU usage after async file server migration#25064
fcfangcc wants to merge 2 commits intovectordotdev:masterfrom
fcfangcc:master-fcfangcc

fcfangcc commented Mar 30, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

This comment was marked as spam.

fcfangcc commented Mar 31, 2026

Uh oh!

bruceg commented Mar 31, 2026

Uh oh!

fcfangcc commented Mar 31, 2026

Uh oh!

pront commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 31, 2026

Uh oh!

thomasqueirozb left a comment •

edited

Loading

Uh oh!

thomasqueirozb Mar 31, 2026

Uh oh!

Uh oh!

fcfangcc commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		const EOF_READ_BACKOFF_MIN: Duration = Duration::from_millis(1);
		const EOF_READ_BACKOFF_MAX: Duration = Duration::from_millis(250);

Conversation

fcfangcc commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

1. Add per-watcher EOF backoff

2. Remove per-read boxing from the shared async buffer read path

3. Add a benchmark for the regression scenario

Vector configuration

How did you test this PR?

Change Type

Is this a breaking change?

Does this PR include user facing changes?

References

Notes

Uh oh!

github-actions bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as spam.

fcfangcc commented Mar 31, 2026

Uh oh!

bruceg commented Mar 31, 2026

Uh oh!

fcfangcc commented Mar 31, 2026

Uh oh!

pront commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 31, 2026

Uh oh!

thomasqueirozb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasqueirozb Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fcfangcc commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fcfangcc commented Mar 30, 2026 •

edited

Loading

github-actions bot commented Mar 30, 2026 •

edited

Loading

thomasqueirozb left a comment •

edited

Loading