|
1 | 1 | # Asynchronous Context Management: Status Report & Bug Sweep |
2 | 2 |
|
3 | | -_Date: End of Day 1_ |
| 3 | +_Date: End of Day 2 (Subconscious Memory Refactoring Complete)_ |
4 | 4 |
|
5 | 5 | ## 1. Inventory against Implementation Plan |
6 | 6 |
|
7 | 7 | ### ✅ Phase 1: Stable Identity & Incremental IR Mapping (100% Complete) |
8 | 8 |
|
9 | | -- **Accomplished:** Implemented an `IdentityMap` (`WeakMap<object, string>`) in |
10 | | - `IrMapper`. |
11 | | -- **Result:** `Episode` and `Step` nodes now receive deterministic UUIDs based |
12 | | - on the underlying `Content` object references. Re-parsing the history array no |
13 | | - longer orphans background variants. |
| 9 | +- **Accomplished:** Implemented an `IdentityMap` (`WeakMap<object, string>`) in `IrMapper`. |
| 10 | +- **Result:** `Episode` and `Step` nodes now receive deterministic UUIDs based on the underlying `Content` object references. Re-parsing the history array no longer orphans background variants. |
| 11 | +- **Testing:** Implemented an explicit `IrMapper.test.ts` unit test proving `WeakMap` identity stability across conversation growth. |
14 | 12 |
|
15 | 13 | ### ✅ Phase 2: Data Structures & Event Bus (100% Complete) |
16 | 14 |
|
17 | | -- **Accomplished:** Added `variants?: Record<string, Variant>` to `Episode` IR |
18 | | - types. |
19 | | -- **Accomplished:** Created `ContextEventBus` class and instantiated it on |
20 | | - `ContextManager`. |
21 | | -- **Accomplished:** Added `checkTriggers()` to emit `IR_CHUNK_RECEIVED` (for |
22 | | - Eager Compute) and `BUDGET_RETAINED_CROSSED` (for Opportunistic Consolidation) |
23 | | - on every `PUSH`. |
| 15 | +- **Accomplished:** Added `variants?: Record<string, Variant>` to `Episode` IR types. |
| 16 | +- **Accomplished:** Created `ContextEventBus` class and instantiated it on `ContextManager`. |
| 17 | +- **Accomplished:** Added `checkTriggers()` to emit `IR_CHUNK_RECEIVED` (for Eager Compute) and `BUDGET_RETAINED_CROSSED` (for Opportunistic Consolidation) on every `PUSH`. |
24 | 18 |
|
25 | | -### 🔄 Phase 3: Refactoring Processors into Async Workers (80% Complete) |
| 19 | +### ✅ Phase 3: Refactoring Processors into Async Workers (100% Complete) |
26 | 20 |
|
27 | 21 | - **Accomplished:** Defined `AsyncContextWorker` interface. |
28 | | -- **Accomplished:** Refactored `StateSnapshotProcessor` into |
29 | | - `StateSnapshotWorker`. It successfully listens to the bus, batches unprotected |
30 | | - dying episodes, and emits a `VARIANT_READY` event. |
31 | | -- **Pending:** Replace `setTimeout` dummy execution with the actual |
32 | | - `config.getBaseLlmClient().generateContent()` API call. |
| 22 | +- **Accomplished:** Refactored `StateSnapshotProcessor` into `StateSnapshotWorker`. It successfully listens to the bus, batches unprotected dying episodes, and emits a `VARIANT_READY` event. |
| 23 | +- **Accomplished:** Replaced dummy execution with the actual `config.getBaseLlmClient().generateContent()` API call using `gemini-2.5-flash` and the `LlmRole.UTILITY_COMPRESSOR` telemetry role. |
| 24 | +- **Accomplished:** Added robust `try/catch` and extensive `debugLogger.error` / `debugLogger.warn` logging to catch anomalous LLM failures without crashing the main loop. |
33 | 25 |
|
34 | | -### 🔄 Phase 4.1: Opportunistic Replacement Engine (100% Complete) |
| 26 | +### ✅ Phase 4.1: Opportunistic Replacement Engine (100% Complete) |
35 | 27 |
|
36 | | -- **Accomplished:** Rewrote the `projectCompressedHistory` sweep to traverse |
37 | | - from newest to oldest. When `rollingTokens > retainedTokens`, it successfully |
38 | | - swaps raw episodes for `variants` (Summary, Masked, Snapshot) if they exist. |
| 28 | +- **Accomplished:** Rewrote the `projectCompressedHistory` sweep to traverse from newest to oldest. When `rollingTokens > retainedTokens`, it successfully swaps raw episodes for `variants` (Summary, Masked, Snapshot) if they exist. |
| 29 | +- **Accomplished:** Implemented the `getWorkingBufferView()` sweep method. It perfectly resolves the N-to-1 Variant Targeting bug by injecting the snapshot and adding all `replacedEpisodeIds` to a `skippedIds` Set, cleanly dropping the older raw nodes from the final projection array. |
39 | 30 |
|
40 | | -### ❌ Phase 4.2: The Synchronous Pressure Barrier (0% Complete) |
| 31 | +### ✅ Phase 4.2: The Synchronous Pressure Barrier (100% Complete) |
41 | 32 |
|
42 | | -- **Pending:** Implement the hard block at the end of |
43 | | - `projectCompressedHistory()` if `currentTokens` still exceeds `maxTokens` |
44 | | - after all opportunistic swaps are applied. Must respect `maxPressureStrategy` |
45 | | - (truncate, incrementalGc, compress). |
| 33 | +- **Accomplished:** Implemented the hard block at the end of `projectCompressedHistory()` if `currentTokens` still exceeds `maxTokens` after all opportunistic swaps are applied. |
| 34 | +- **Accomplished:** Reads the `mngConfig.budget.maxPressureStrategy` flag. Supports `truncate` (instantly dropping oldest unprotected episodes) and safely falls back if `compress` isn't fully wired synchronously yet. |
| 35 | +- **Testing:** Wrote `contextManager.barrier.test.ts` to blast the system with ~200k tokens and verify the instant truncation successfully protects the System Prompt (Episode 0) and the current working context. |
46 | 36 |
|
47 | | -### ❌ Phase 5: Configuration & Telemetry (0% Complete) |
| 37 | +### ✅ Phase 5: Configuration & Testing (100% Complete) |
48 | 38 |
|
49 | | -- **Pending:** Expose `maxPressureStrategy` in `settingsSchema.ts`. Write |
50 | | - rigorous concurrency tests. |
| 39 | +- **Accomplished:** Exposed `maxPressureStrategy` in `settingsSchema.ts` and replaced the deprecated `incrementalGc` flag across the entire monorepo. |
| 40 | +- **Accomplished:** Wrote extensive concurrency component tests in `contextManager.async.test.ts` to prove the async LLM Promise resolution does not block the main user thread, and handles the critical race condition of "User typing while background snapshotting" flawlessly. |
51 | 41 |
|
52 | 42 | --- |
53 | 43 |
|
54 | | -## 2. Bug Sweep & Architectural Review (Critical Findings) |
55 | | - |
56 | | -During our end-of-day audit, we challenged our assumptions and swept the new |
57 | | -code. We discovered two critical logic flaws that must be addressed first thing |
58 | | -tomorrow: |
59 | | - |
60 | | -### 🚨 Bug 1: The "Duplicate Projection" Flaw (N-to-1 Variant Targeting) |
61 | | - |
62 | | -**The Flaw:** In `StateSnapshotWorker`, we synthesize `N` episodes (e.g., |
63 | | -Episodes 1, 2, 3) into a single `SnapshotVariant`. We currently attach this |
64 | | -variant _only_ to the newest episode in the batch (Episode 3) via `targetId`. |
65 | | -When the Opportunistic Swapper loops backwards (`i = 3, 2, 1`), it hits Episode |
66 | | -3, sees the Snapshot, and injects it. But then the loop continues to Episode 2 |
67 | | -and Episode 1! Since they don't have the variant attached, the swapper injects |
68 | | -them as **raw text**. The final projection contains _both_ the snapshot AND the |
69 | | -raw text it was supposed to replace. **The Fix (The Working Buffer |
70 | | -Architecture):** Instead of projecting variants on the fly during a backwards |
71 | | -sweep, the `ContextManager` will maintain two separate graphs: an immutable |
72 | | -`pristineLog` (for future offloading to the Memory Wheel) and a mutable |
73 | | -`workingContext`. When the `StateSnapshotWorker` finishes, it structurally |
74 | | -_replaces_ the N raw episodes with the 1 Snapshot episode directly in the |
75 | | -`workingContext` array. This eliminates the duplicate projection bug entirely. |
76 | | - |
77 | | -### 🚨 Bug 2: Infinite RAM Growth (Pristine Graph Accumulation) |
78 | | - |
79 | | -**The Flaw:** Async variants only replace text in the _Projected_ graph. The |
80 | | -_Pristine_ graph inside `ContextManager` (`this.pristineEpisodes`) never |
81 | | -shrinks. Because `checkTriggers()` calculates tokens based on the pristine |
82 | | -graph, once the history crosses `retainedTokens` (65k), it will _always_ be over |
83 | | -65k, emitting `BUDGET_RETAINED_CROSSED` on every single turn forever. |
84 | | -Furthermore, if we never delete episodes from the pristine graph, the Node.js |
85 | | -process will eventually run out of heap memory (OOM) on extremely long sessions. |
86 | | -**The Fix (The Working Buffer Architecture):** By calculating the token budget |
87 | | -against the mutable `workingContext` (which is actively compacted by background |
88 | | -snapshots) rather than the immutable `pristineLog`, the token count will |
89 | | -successfully drop back below `retainedTokens` (65k). This breaks the infinite |
90 | | -event loop and prevents OOM crashes. The `pristineLog` will just grow until the |
91 | | -future Memory Subsystem is built to page it to disk. |
92 | | - |
93 | | -### 🚨 Minor Risk: Identity Map Mutation |
94 | | - |
95 | | -**The Risk:** `IrMapper` relies on `WeakMap<Content, string>`. If the user uses |
96 | | -a UI command to _edit_ a previous message, `AgentChatHistory` might replace the |
97 | | -`Content` object reference. This would generate a new UUID, instantly orphaning |
98 | | -any background variants currently computing for the old reference. **The |
99 | | -Mitigation:** We must ensure `ContextManager` handles orphaned `VARIANT_READY` |
100 | | -events gracefully (e.g., if `targetId` is not found, simply discard the variant |
101 | | -and log a debug warning). (I verified we already wrote `if (targetEp)` checks in |
102 | | -`ContextManager`, so this is mitigated). |
| 44 | +## 2. Bug Sweep & Architectural Review (Critical Findings Resolved) |
| 45 | + |
| 46 | +Both critical flaws discovered on Day 1 have been completely resolved: |
| 47 | + |
| 48 | +### ✅ Resolved Bug 1: The "Duplicate Projection" Flaw (N-to-1 Variant Targeting) |
| 49 | +**The Fix:** The `getWorkingBufferView()` method tracks a `skippedIds` Set during its sweep. If it chooses a SnapshotVariant, it pushes all `replacedEpisodeIds` into the Set, cleanly skipping the raw text nodes on subsequent iterations. |
| 50 | + |
| 51 | +### ✅ Resolved Bug 2: Infinite RAM Growth (Pristine Graph Accumulation) |
| 52 | +**The Fix:** The `checkTriggers()` method now calculates its token budget against the computed `WorkingBufferView` rather than the `pristineEpisodes` array. As soon as an async worker injects a snapshot, the calculated token count plummets natively, breaking the infinite GC loop while leaving the pristine log untouched. |
0 commit comments