-
Notifications
You must be signed in to change notification settings - Fork 697
Labels
bugSomething isn't workingSomething isn't workingrouterRelates to routing, KV-aware routing, etc.Relates to routing, KV-aware routing, etc.
Description
Describe the Bug
During startup (even before any client traffic), the KV Router indexer replays stored events and receives Stored payloads where parent_hash == block_hash. When such a self-referential block arrives, the indexer panics (RefCell already borrowed) and the frontend reports IndexerOffline for every request.
Steps to Reproduce
- Launch the router + backend stack (sglang agg+router example for me) after wiping state.
- Let the frontend replay discovery/JetStream logs; no user requests are needed.
- Watch the router logs:
Smth like this:
�[2m2025-11-17T16:25:34.092833Z�[0m �[35mTRACE�[0m �[2mdynamo_llm::kv_router::indexer�[0m�[2m:�[0m RadixTree::apply_event: Store operation: Stored(KvCacheStoreData { parent_hash: Some(ExternalSequenceBlockHash(1038227862389840939)), blocks: [KvCacheStoredBlockData { block_hash: ExternalSequenceBlockHash(14681692605194770420), tokens_hash: LocalBlockHash(10137791622231551288) }] }) �[3mid�[0m�[2m=�[0m20
�[2m2025-11-17T16:25:34.092837Z�[0m �[35mTRACE�[0m �[2mdynamo_llm::kv_router::indexer�[0m�[2m:�[0m RadixTree::apply_event: Store operation: Stored(KvCacheStoreData { parent_hash: Some(ExternalSequenceBlockHash(14681692605194770420)), blocks: [KvCacheStoredBlockData { block_hash: ExternalSequenceBlockHash(4876245085811370259), tokens_hash: LocalBlockHash(5370710679942703327) }] }) �[3mid�[0m�[2m=�[0m20
�[2m2025-11-17T16:25:34.092863Z�[0m �[35mTRACE�[0m �[2mdynamo_llm::kv_router::indexer�[0m�[2m:�[0m RadixTree::apply_event: Store operation: Stored(KvCacheStoreData { parent_hash: Some(ExternalSequenceBlockHash(4876245085811370259)), blocks: [KvCacheStoredBlockData { block_hash: ExternalSequenceBlockHash(301883270755155684), tokens_hash: LocalBlockHash(16694144929205133692) }] }) �[3mid�[0m�[2m=�[0m20
�[2m2025-11-17T16:25:34.092876Z�[0m �[35mTRACE�[0m �[2mdynamo_llm::kv_router::indexer�[0m�[2m:�[0m RadixTree::apply_event: Store operation: Stored(KvCacheStoreData { parent_hash: Some(ExternalSequenceBlockHash(301883270755155684)), blocks: [KvCacheStoredBlockData { block_hash: ExternalSequenceBlockHash(18111100450886515351), tokens_hash: LocalBlockHash(5311879336024216665) }] }) �[3mid�[0m�[2m=�[0m20
�[2m2025-11-17T16:25:34.092883Z�[0m �[35mTRACE�[0m �[2mdynamo_llm::kv_router::indexer�[0m�[2m:�[0m RadixTree::apply_event: Store operation: Stored(KvCacheStoreData { parent_hash: Some(ExternalSequenceBlockHash(18111100450886515351)), blocks: [KvCacheStoredBlockData { block_hash: ExternalSequenceBlockHash(18111100450886515351), tokens_hash: LocalBlockHash(5311879336024216665) }] }) �[3mid�[0m�[2m=�[0m20
thread '<unnamed>' panicked at /opt/dynamo/lib/llm/src/kv_router/indexer.rs:408:26:
RefCell already borrowed
stack backtrace:
0: __rustc::rust_begin_unwind
1: core::panicking::panic_fmt
2: core::cell::panic_already_borrowed::do_panic::runtime
3: core::cell::panic_already_borrowed
4: dynamo_llm::kv_router::indexer::RadixTree::apply_event
5: dynamo_llm::kv_router::indexer::KvIndexer::new_with_frequency::{{closure}}::{{closure}}
6: tokio::runtime::runtime::Runtime::block_on
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Expected Behavior
- KV indexer validates block sequences and never panics on malformed events.
- Frontend continues serving requests even if KV routing is temporarily unavailable.
- Upstream components never emit self-referential Stored events.
Actual Behavior
- KV Indexer thread panicked on malformed event.
- Frontend response 500 on every request after that.
- Upstream components emit self-referential Stored events.
Environment
- Dynamo from main (commit dce20d0)
- SGLang aggregated workers
Additional Context
No response
Screenshots
No response
PeaBrane
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingrouterRelates to routing, KV-aware routing, etc.Relates to routing, KV-aware routing, etc.