Skip to content

Replace fdir filesystem crawler with git ls-files + ripgrep for file search #3137

@tanzhenxin

Description

@tanzhenxin

What would you like to be added?

Replace the current fdir-based recursive filesystem crawler (used by the @ file mention / file search feature) with a two-tier strategy:

  1. Primary: git ls-files — read tracked files directly from the git index
  2. Fallback: ripgrep --files — for non-git directories, with timeout and buffer caps

Additionally, adopt the following supporting improvements:

  • Background untracked file merge — fetch untracked files asynchronously with a timeout (e.g. 10s), merge them into the index in the background so the main search stays fast
  • Mtime-based change detection — watch .git/index mtime to detect when the file list has changed, instead of re-crawling on every refresh
  • Refresh throttling — throttle index rebuilds (e.g. once per 5s) to prevent thrashing
  • Async chunked indexing — yield to the event loop periodically during indexing so the UI stays responsive even with 200k+ files

Why is this needed?

The current fdir-based crawler performs an unbounded recursive filesystem walk. This causes problems in several real-world scenarios:

Switching to git ls-files addresses the root cause:

  • Inherently bounded: only returns tracked files, so the result set is always reasonable regardless of what's on disk
  • Fast: reads from .git/index (an in-memory data structure), no filesystem walk needed
  • No ignore rule complexity: git already knows what's tracked; no need to parse and apply .gitignore rules ourselves
  • Battle-tested: git handles edge cases (submodules, sparse checkout, large repos) that a naive filesystem walk does not

The ripgrep fallback for non-git directories brings its own protections:

  • Timeout (e.g. 20s) — kills runaway searches; escalates from SIGTERM to SIGKILL if the process doesn't exit gracefully
  • Buffer cap (e.g. 20 MB on stdout) — prevents OOM when ripgrep discovers 200k+ files
  • Built-in ignore support — respects .gitignore, .ignore, .rgignore out of the box

Additional context

None.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions