Skip to content

[BUG]: KV-Router indexer receives invalid self-referential blocks #4394

@vladnosiv

Description

@vladnosiv

Describe the Bug

During startup (even before any client traffic), the KV Router indexer replays stored events and receives Stored payloads where parent_hash == block_hash. When such a self-referential block arrives, the indexer panics (RefCell already borrowed) and the frontend reports IndexerOffline for every request.

Steps to Reproduce

  1. Launch the router + backend stack (sglang agg+router example for me) after wiping state.
  2. Let the frontend replay discovery/JetStream logs; no user requests are needed.
  3. Watch the router logs:

Smth like this:

�[2m2025-11-17T16:25:34.092833Z�[0m �[35mTRACE�[0m �[2mdynamo_llm::kv_router::indexer�[0m�[2m:�[0m RadixTree::apply_event: Store operation: Stored(KvCacheStoreData { parent_hash: Some(ExternalSequenceBlockHash(1038227862389840939)), blocks: [KvCacheStoredBlockData { block_hash: ExternalSequenceBlockHash(14681692605194770420), tokens_hash: LocalBlockHash(10137791622231551288) }] }) �[3mid�[0m�[2m=�[0m20
�[2m2025-11-17T16:25:34.092837Z�[0m �[35mTRACE�[0m �[2mdynamo_llm::kv_router::indexer�[0m�[2m:�[0m RadixTree::apply_event: Store operation: Stored(KvCacheStoreData { parent_hash: Some(ExternalSequenceBlockHash(14681692605194770420)), blocks: [KvCacheStoredBlockData { block_hash: ExternalSequenceBlockHash(4876245085811370259), tokens_hash: LocalBlockHash(5370710679942703327) }] }) �[3mid�[0m�[2m=�[0m20
�[2m2025-11-17T16:25:34.092863Z�[0m �[35mTRACE�[0m �[2mdynamo_llm::kv_router::indexer�[0m�[2m:�[0m RadixTree::apply_event: Store operation: Stored(KvCacheStoreData { parent_hash: Some(ExternalSequenceBlockHash(4876245085811370259)), blocks: [KvCacheStoredBlockData { block_hash: ExternalSequenceBlockHash(301883270755155684), tokens_hash: LocalBlockHash(16694144929205133692) }] }) �[3mid�[0m�[2m=�[0m20
�[2m2025-11-17T16:25:34.092876Z�[0m �[35mTRACE�[0m �[2mdynamo_llm::kv_router::indexer�[0m�[2m:�[0m RadixTree::apply_event: Store operation: Stored(KvCacheStoreData { parent_hash: Some(ExternalSequenceBlockHash(301883270755155684)), blocks: [KvCacheStoredBlockData { block_hash: ExternalSequenceBlockHash(18111100450886515351), tokens_hash: LocalBlockHash(5311879336024216665) }] }) �[3mid�[0m�[2m=�[0m20
�[2m2025-11-17T16:25:34.092883Z�[0m �[35mTRACE�[0m �[2mdynamo_llm::kv_router::indexer�[0m�[2m:�[0m RadixTree::apply_event: Store operation: Stored(KvCacheStoreData { parent_hash: Some(ExternalSequenceBlockHash(18111100450886515351)), blocks: [KvCacheStoredBlockData { block_hash: ExternalSequenceBlockHash(18111100450886515351), tokens_hash: LocalBlockHash(5311879336024216665) }] }) �[3mid�[0m�[2m=�[0m20

thread '<unnamed>' panicked at /opt/dynamo/lib/llm/src/kv_router/indexer.rs:408:26:
RefCell already borrowed
stack backtrace:
   0: __rustc::rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::cell::panic_already_borrowed::do_panic::runtime
   3: core::cell::panic_already_borrowed
   4: dynamo_llm::kv_router::indexer::RadixTree::apply_event
   5: dynamo_llm::kv_router::indexer::KvIndexer::new_with_frequency::{{closure}}::{{closure}}
   6: tokio::runtime::runtime::Runtime::block_on
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Expected Behavior

  • KV indexer validates block sequences and never panics on malformed events.
  • Frontend continues serving requests even if KV routing is temporarily unavailable.
  • Upstream components never emit self-referential Stored events.

Actual Behavior

  • KV Indexer thread panicked on malformed event.
  • Frontend response 500 on every request after that.
  • Upstream components emit self-referential Stored events.

Environment

  • Dynamo from main (commit dce20d0)
  • SGLang aggregated workers

Additional Context

No response

Screenshots

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingrouterRelates to routing, KV-aware routing, etc.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions