You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace the current fdir-based recursive filesystem crawler (used by the @ file mention / file search feature) with a two-tier strategy:
Primary: git ls-files — read tracked files directly from the git index
Fallback: ripgrep --files — for non-git directories, with timeout and buffer caps
Additionally, adopt the following supporting improvements:
Background untracked file merge — fetch untracked files asynchronously with a timeout (e.g. 10s), merge them into the index in the background so the main search stays fast
Mtime-based change detection — watch .git/index mtime to detect when the file list has changed, instead of re-crawling on every refresh
Refresh throttling — throttle index rebuilds (e.g. once per 5s) to prevent thrashing
Async chunked indexing — yield to the event loop periodically during indexing so the UI stays responsive even with 200k+ files
Why is this needed?
The current fdir-based crawler performs an unbounded recursive filesystem walk. This causes problems in several real-world scenarios:
Very large repositories (200k+ files): fdir materializes every path into a JS array, consuming significant memory. The downstream fzf indexing amplifies this further.
Missing or incomplete .gitignore: without proper ignore rules, the crawler walks into node_modules, .cache, build artifacts, etc., potentially discovering millions of files and causing OOM crashes (see CLI 用户输入的文本中有特殊@latest 字符会导致,CLI思考处理后直接报错退出 #3130).
Switching to git ls-files addresses the root cause:
Inherently bounded: only returns tracked files, so the result set is always reasonable regardless of what's on disk
Fast: reads from .git/index (an in-memory data structure), no filesystem walk needed
No ignore rule complexity: git already knows what's tracked; no need to parse and apply .gitignore rules ourselves
Battle-tested: git handles edge cases (submodules, sparse checkout, large repos) that a naive filesystem walk does not
The ripgrep fallback for non-git directories brings its own protections:
Timeout (e.g. 20s) — kills runaway searches; escalates from SIGTERM to SIGKILL if the process doesn't exit gracefully
Buffer cap (e.g. 20 MB on stdout) — prevents OOM when ripgrep discovers 200k+ files
Built-in ignore support — respects .gitignore, .ignore, .rgignore out of the box
What would you like to be added?
Replace the current
fdir-based recursive filesystem crawler (used by the@file mention / file search feature) with a two-tier strategy:git ls-files— read tracked files directly from the git indexripgrep --files— for non-git directories, with timeout and buffer capsAdditionally, adopt the following supporting improvements:
.git/indexmtime to detect when the file list has changed, instead of re-crawling on every refreshWhy is this needed?
The current
fdir-based crawler performs an unbounded recursive filesystem walk. This causes problems in several real-world scenarios:fdirmaterializes every path into a JS array, consuming significant memory. The downstream fzf indexing amplifies this further..gitignore: without proper ignore rules, the crawler walks intonode_modules,.cache, build artifacts, etc., potentially discovering millions of files and causing OOM crashes (see CLI 用户输入的文本中有特殊@latest 字符会导致,CLI思考处理后直接报错退出 #3130).Switching to
git ls-filesaddresses the root cause:.git/index(an in-memory data structure), no filesystem walk needed.gitignorerules ourselvesThe ripgrep fallback for non-git directories brings its own protections:
.gitignore,.ignore,.rgignoreout of the boxAdditional context
None.