Skip to content

Refactor task history persistence to use file-based storage #3785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

KJ7LNW
Copy link
Collaborator

@KJ7LNW KJ7LNW commented May 21, 2025

Context

The current task history persistence mechanism stores all history items directly in VSCode's globalState, which is causing several critical issues:

  1. VSCode Warnings: The extension triggers VSCode warnings about excessive globalState usage:

    WARN [mainThreadStorage] large extension state detected (extensionId: RooVeterinaryInc.roo-cline, global: true): 6173.2822265625kb. Consider to use 'storageUri' or 'globalStorageUri' to store this data on disk instead.
    
  2. Extension Crashes and UI Issues: Users experience various issues that may be related to memory management and globalState limitations:

    • Complete extension crashes or grey/white screens
    • Memory leaks consuming all available RAM
    • Possible globalState race conditions between VS Code instances leading to hangs, crashes, and VS Code instability
  3. Performance Degradation: Even before crashing, the extension suffers from performance issues due to loading and processing large amounts of history data at once.

  4. Scale Problem: A busy developer can accumulate tens of thousands of tasks over the course of a year. At this scale, the globalState approach becomes completely unsustainable.

Implementation

This PR implements a new architecture for task history persistence:

  1. File-based Storage System:

    • Store individual history items as JSON files in a directory structure organized by year and month (task_history/YYYY/MM/)
    • Use globalState only for monthly indexes to enable efficient lookups
    • Implement safeWriteJson utility for data integrity during file writes
  2. Frontend Changes:

    • Modify ClineProvider to provide availableHistoryMonths list instead of full taskHistory
    • Update frontend to fetch history data month-by-month based on available months
    • Add loading indicators and empty state handling in history view
  3. Migration Process:

    • Automatically migrate existing taskHistory data from globalState to the new file-based system
    • Create a backup of the old data before migration

How to Test

  1. Install the extension and verify that existing history is migrated correctly
  2. Create new tasks and verify they are saved properly
  3. View history and verify it loads correctly
  4. Check that large history sets no longer cause VSCode warnings or crashes

Get in Touch

Discord: KJ7LNW

Fixes: #3784

Implements a robust JSON file writing utility that:
- Prevents concurrent writes to the same file using in-memory locks
- Ensures atomic operations with temporary file and backup strategies
- Handles error cases with proper rollback mechanisms
- Cleans up temporary files even when operations fail
- Provides comprehensive test coverage for success and failure scenarios

Signed-off-by: Eric Wheeler <[email protected]>
@KJ7LNW
Copy link
Collaborator Author

KJ7LNW commented May 21, 2025

@cte @mrubens @hannesrudolph

This is a first pass to fix the growing bloat from task history that is stored in the global extension state. It assumes that #3772 has been merged for JSON safety.

The biggest point for review is that it converts existing global state task history into on-disk JSON structures. It converts my global state of 2800+ historical tasks quiet nicely, and I have verified the file output and all existing tasks and search functionality appears to function. Creating new tasks or modifying existing ones adds to the new JSON files and everything seems to proceed as expected.

There are still some optimizations to do:

  1. history preview does not need to load all task history, as only 3 items are shown on the front page, so that will substantially speed up extension load time
  2. the entire task list could theoretically load tasks in chunks by month, and then automatically load more as you scroll down, but I am not sure if I want to add that in an initial commit because it is complicated for things like searching that expect everything to be loaded. For now, it just loads all tasks into the task history interface (which is the same behavior as before)
  3. there would probably be a first load spinner indicating conversion progress. My system it converts so fast I do not notice the difference but people who may have tens of thousands of boomerang tasks might notice.

This is an intentional break from backwards compatibility specifically for task history storage, however the old global state is not deleted, so if you downgrade to an older version (or if you are a developer testing things prior to this PR), you will still have everything that you used to have.

Eric Wheeler added 4 commits May 21, 2025 21:19
This commit implements a new architecture for task history persistence:

- Create file-based storage system for HistoryItem objects in task_history/YYYY/MM/
- Use globalState for monthly indexes to ensure efficient lookups
- Implement safeWriteJson utility for data integrity during file writes
- Add comprehensive tests for the new task history service
- Modify ClineProvider to provide availableHistoryMonths list instead of full taskHistory
- Update frontend to fetch history data month-by-month based on available months
- Add loading indicators and empty state handling in history view

This change significantly improves scalability by:
1. Reducing memory usage in globalState
2. Enabling efficient month-based retrieval
3. Improving data integrity with atomic file operations
4. Supporting incremental loading of history data

Fixes: cline#3784

Signed-off-by: Eric Wheeler <[email protected]>
Remove references to the taskHistory property in ClineProvider.test.ts since it has been replaced with availableHistoryMonths in the ExtensionState type.

This fixes TypeScript errors that were preventing the push to GitHub.

Signed-off-by: Eric Wheeler <[email protected]>
The existing test file for HistoryView.tsx has been replaced with a new,
more comprehensive suite. The new suite (formerly HistoryView.new.test.tsx)
includes 12 passing tests that cover:
- Rendering of loading, empty, and data-filled states.
- Interactions with search input and sort options.
- Clicking a task item to view details.
- Copying task content.
- Single and batch task deletion (using mocked dialogs).
- Workspace filter toggle interaction, including verification of
  changes to the displayed task list.
- Export functionality (using a mocked ExportButton).

This new suite improves test coverage and robustness for the
HistoryView component, particularly for its newer data loading
mechanisms and interactions, while adhering to testing constraints
by mocking child components and hooks where necessary.

Signed-off-by: Eric Wheeler <[email protected]>
Adds the initial version of the test suite for task history persistence.
This suite covers functionality including item creation, retrieval,
deletion, updates, month-based fetching, search, and migration logic.

The tests utilize mocks for 'vscode', 'fs/promises', and 'safeWriteJson'
to isolate the taskHistory module during testing.

Signed-off-by: Eric Wheeler <[email protected]>
@KJ7LNW KJ7LNW force-pushed the refactor-use-files-for-history branch from 3692f2d to b916aa4 Compare May 22, 2025 04:21
@hannesrudolph hannesrudolph moved this from New to PR [Draft/WIP] in Roo Code Roadmap May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: PR [Draft/WIP]
Development

Successfully merging this pull request may close these issues.

Extension crashes and performance issues due to excessive globalState usage for task history
1 participant