Add opt-in union-find chat history summarizer#4940
Draft
kimjune01 wants to merge 9 commits intoAider-AI:mainfrom
Draft
Add opt-in union-find chat history summarizer#4940kimjune01 wants to merge 9 commits intoAider-AI:mainfrom
kimjune01 wants to merge 9 commits intoAider-AI:mainfrom
Conversation
…ry-summarizer) Port 4 modules from standalone implementation (145 existing tests): - context_window.py: Forest (union-find clusters) + ContextWindow (hot/cold zones) - embedding_service.py: Pure Python TF-IDF embedder (no new dependencies) - cluster_summarizer.py: Per-cluster summarization via model cascade - chat_summary_uf.py: ChatSummaryUF(ChatSummary) drop-in subclass Integration: --chat-history-summarizer union-find flag in args.py, conditional construction in main.py. Default unchanged (recursive). Backend fixes applied during port: - Stable root ordering via _root_order list (deterministic render output) - Weighted centroid averaging in union() (prevents cluster identity distortion) Safety: mandatory fallback to recursive if output exceeds budget, stale-safety preserved via _fed_count mechanism, same tokenizer and model cascade. 49 new tests covering 12 areas. 523 existing tests unaffected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8 tests with realistic 3-topic conversation (path traversal, Windows drive letters, file descriptor leak): cluster formation, output format, cold+hot render, distinct topic clustering, recursive fallback, stale rebuild, system message filtering, deterministic rendering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix hot tail mismatch: track fed message indices so hot_count maps back to correct original messages even with system/tool messages interspersed - Fix unbounded _hot growth: trim graduated entries after each append - Fix empty embedding drop in union(): preserve non-empty side - Remove unused Forest.is_dirty() and Forest.dirty_inputs() - Remove list-vector branches from _cosine_similarity and union() (all embeddings are sparse dicts from TFIDFEmbedder) - Add tests: mixed-role preservation (2), memory bounds (2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- resolve_dirty() failure now falls back to recursive instead of crashing - Remove _maybe_evict() and evict_at parameter — dead code since _maybe_graduate() already keeps hot zone at <= graduate_at - Update all tests to remove evict_at references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Non-root members' _content and _embedding are only needed before merge (for dirty input collection and centroid computation). After union(), only the root's centroid and summary matter. Without cleanup, _content grows unbounded in long sessions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: aider (claude-sonnet-4-5) <aider@aider.chat>
…istencies" This reverts commit 7ddc0a2.
Control-loop bug: summarization triggers on tokens (too_big) but graduation triggers on message count (graduate_at=26). Token budget fires first, no cold clusters exist, falls back to recursive every time. Union-find path was unreachable in real usage. Fix: when summarize() runs with no cold clusters and >4 hot messages, force_graduate() moves the oldest half to the cold forest. This breaks the deadlock and lets clusters form before rendering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part 1 of 2. This PR adds the backend. Part 2 adds
/topicsand/drop-topiccommands on top of it. Split for reviewability — together they give users selective control over chat history.Summary
--chat-history-summarizer union-findflag, opt-in alternative to the default recursive summarizerTest plan
tests/basic/test_chat_summary_uf.py+tests/basic/test_smoke_uf.py)Why
The current recursive summarizer compresses chat history into a single opaque text blob. Original messages are discarded. There's no provenance, and no way to selectively remove one stale topic without re-summarizing everything.
Union-find compaction groups messages into topic-coherent clusters and summarizes each one independently. Every summary traces back to its source messages through
find(). Topics become addressable units you can inspect and drop.This is the backend. User-facing commands (
/topics,/drop-topic) come in a follow-up PR once the foundation is reviewed.Background
The algorithm was prototyped against gemini-cli, where it showed a +8–18pp recall advantage over flat summarization across 7 trials (1 significant at p=0.039, rest directional). It was then ported to aider for validation against aider's stronger recursive baseline. Full experiment methodology, preregistration, and data are in the research repo.
What changes
New files (549 lines of production code, 689 lines of tests):
aider/context_window.pyForest(union-find cluster store with stable ordering and weighted centroids) +ContextWindow(hot/cold zones with graduation and eviction)aider/embedding_service.pyTFIDFEmbedder— pure Python, incremental vocabulary, no external dependenciesaider/cluster_summarizer.pysimple_send_with_retries)aider/chat_summary_uf.pyChatSummaryUF(ChatSummary)— drop-in subclass with incremental feeding and mandatory fallbacktests/basic/test_chat_summary_uf.pytests/basic/test_smoke_uf.pyModified files (15 lines changed):
aider/args.py--chat-history-summarizerargument (defaultrecursive, choices[recursive, union-find])aider/main.pyChatSummaryUFwhenunion-find,ChatSummaryotherwiseWhat doesn't change
summarize_start/summarize_worker/summarize_end)[summary_msg, "Ok.", *hot_messages])summarize_all()behavior (delegates to parent)/clear,/drop,/tokens,/reset)How it works
Messages flow through a hot zone and a cold forest:
graduate_at(26 messages), the oldest graduates to the forestmax_cold_clusters(10), the closest pair is force-mergedrender()returns cold summaries + hot contents, formatted as[summary_msg, "Ok.", *hot_messages]The overlap window (graduate_at=26, evict_at=30) gives
resolve_dirty()time to summarize before eviction.Safety
max_tokensor is ≥ input tokens, falls back tosuper().summarize()(recursive). Worst case, you get the current system.summarize_end()stale check works identically. The_fed_countmechanism triggers a full forest rebuild whendone_messageschanges (shrinks, is cleared, or is replaced by a previous summarization result).self.token_count = self.models[0].token_countfromChatSummary.__init__().roots()returns clusters in insertion order via a tracked_root_orderlist, ensuring deterministicrender()output across calls.union()weights centroids by cluster size (emb * size / total), preventing small clusters from distorting large ones over repeated merges.Test coverage
49 unit tests in
tests/basic/test_chat_summary_uf.py, organized by area:summarize_all()parity_fed_countshrink triggers_init_context_window(), incremental feeding tracks correctly8 smoke tests in
tests/basic/test_smoke_uf.pywith a distilled Flask #1169 conversation (3 topics: path traversal, Windows drive letters, file descriptor leak):[summary_msg, "Ok.", *hot_messages]structure preservedrender()returns cluster summaries followed by hot contentsrender()produces identical output on repeated callsBenchmark
Tested on 17 real aider conversations (136 paired observation points where both backends triggered summarization):
The union-find backend produces quality-equivalent summaries. The value is structured context: visibility into what the model remembers, and selective control over what it forgets.
Follow-up (not in this PR)
/topics— read-only command showing topic clusters with token counts and previews/drop-topic N— selective topic removal withdone_messagessync and threading guardFuture paths
The new code is fully contained — 4 new files, no modifications to base_coder.py, no changes to the threading model, no new state that other systems depend on. The opt-in flag keeps all three options cheap:
default="recursive"todefault="union-find"inargs.py. One line.args.py+main.py. Five minutes.