Skip to content

fix(observability): fix panic in allocation tracing dealloc path#25136

Open
pront wants to merge 4 commits intomasterfrom
pavlos/fix-allocation-tracing-dealloc-panic
Open

fix(observability): fix panic in allocation tracing dealloc path#25136
pront wants to merge 4 commits intomasterfrom
pavlos/fix-allocation-tracing-dealloc-panic

Conversation

@pront
Copy link
Copy Markdown
Member

@pront pront commented Apr 7, 2026

Summary

Fixes a panic (abort) in the vector-alloc-processor thread when --allocation-tracing is enabled:

thread 'vector-alloc-processor' panicked at src/internal_telemetry/allocations/allocator/token.rs:30:23:
unsafe precondition(s) violated: NonZero::new_unchecked requires the argument to be non-zero

Root cause: The custom allocator wraps every allocation with an extra byte to store the allocation group ID. Previously, allocations made before tracking was enabled used the original (unwrapped) layout. When those were later freed, dealloc read an out-of-bounds byte as the group ID, hitting NonZeroU8::new_unchecked(0) -- undefined behavior that recent Rust toolchains (>= ~1.78) turn into an abort in debug builds.

Additionally, reentrant allocations (wrapped layout but tracing closure skipped due to RefCell borrow) left the group ID header uninitialized, causing misattributed deallocations and skewed per-group memory accounting.

Fix: Always allocate with the wrapped layout, regardless of whether tracking is currently enabled. The group ID header byte is set to:

  • UNTRACKED (0): tracking was off at allocation time
  • UNTRACED (u8::MAX): tracking was on but the tracing closure was skipped due to reentrancy
  • A real group ID (1..254): normal traced allocation

On deallocation, all paths free with the wrapped layout (always correct now). UNTRACKED and UNTRACED skip trace_deallocation to keep per-group accounting balanced.

This eliminates:

  • The original panic/UB from NonZero::new_unchecked(0)
  • Layout mismatches for pre-tracking allocations (including realloc)
  • Accounting skew from uninitialized headers on reentrant allocations

History: This bug has been latent since #15221 (Nov 2022) which introduced the runtime toggle. It was never caught because release builds don't enable UB precondition checks, and debug builds on older Rust silently allowed the invalid NonZeroU8(0).

Vector configuration

Any configuration with --allocation-tracing flag or ALLOCATION_TRACING=true env var.

How did you test this PR?

  • 4 new unit tests covering: sentinel invariants, untracked alloc/dealloc, tracked alloc/dealloc, and sentinel dealloc path
  • cargo run -- -c <config.yml> --allocation-tracing no longer panics
  • make check-clippy and make check-fmt pass

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

🤖 Generated with Claude Code

@pront pront requested a review from a team as a code owner April 7, 2026 14:50
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7205b2360

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@pront pront force-pushed the pavlos/fix-allocation-tracing-dealloc-panic branch 2 times, most recently from 565e816 to 37e58e6 Compare April 7, 2026 15:41
@pront pront force-pushed the pavlos/fix-allocation-tracing-dealloc-panic branch from 37e58e6 to 5360442 Compare April 7, 2026 15:47
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5360442631

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… pre-tracking memory

When `--allocation-tracing` is enabled at runtime, the custom allocator
wraps every allocation with an extra byte to store the allocation group
ID. Previously, allocations made before tracking was enabled used the
original (unwrapped) layout. When those were later freed, `dealloc`
read an out-of-bounds byte as the group ID, hitting
`NonZeroU8::new_unchecked(0)` -- undefined behavior that recent Rust
toolchains (>= ~1.78) turn into an abort in debug builds.

Additionally, reentrant allocations (wrapped layout but tracing closure
skipped) left the group ID header uninitialized, causing misattributed
deallocations and skewed per-group memory accounting.

Fix: always allocate with the wrapped layout, regardless of whether
tracking is currently enabled. The group ID header byte is set to:
- UNTRACKED (0): tracking was off at allocation time
- UNTRACED (u8::MAX): tracking was on but the tracing closure was
  skipped due to reentrancy
- A real group ID (1..254): normal traced allocation

On deallocation, all paths free with the wrapped layout (which is
always correct now). UNTRACKED and UNTRACED skip `trace_deallocation`
to keep per-group accounting balanced.

This eliminates:
- The original panic/UB from `NonZero::new_unchecked(0)`
- Layout mismatches for pre-tracking allocations (including realloc)
- Accounting skew from uninitialized headers on reentrant allocations

This bug has been latent since #15221 (Nov 2022) which introduced the
runtime toggle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pront and others added 3 commits April 7, 2026 15:22
Harden test coverage based on code review feedback:
- Assert exact ROOT group ID and trace counters in tracked alloc test
- Add end-to-end reentrant allocation test that exercises the real
  reentrancy path via thread-local borrow
- Remove redundant manual-sentinel test now covered by reentrant test
- Rename sentinel collision test to reflect what it actually checks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vectordotdev vectordotdev deleted a comment from chatgpt-codex-connector bot Apr 7, 2026
@pront
Copy link
Copy Markdown
Member Author

pront commented Apr 7, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants