Skip to content

⚡ Bolt: Chunk concurrency in file-history to prevent EMFILE errors#91

Merged
iotserver24 merged 1 commit into
mainfrom
bolt-bounded-concurrency-file-history-14400971778822315170
May 17, 2026
Merged

⚡ Bolt: Chunk concurrency in file-history to prevent EMFILE errors#91
iotserver24 merged 1 commit into
mainfrom
bolt-bounded-concurrency-file-history-14400971778822315170

Conversation

@iotserver24
Copy link
Copy Markdown
Owner

💡 What: Replaced unbounded Promise.all with chunked batching (CONCURRENCY_LIMIT = 20) in fileHistoryCapture within packages/core/src/file-history.ts.
🎯 Why: Iterating over thousands of trackedFiles concurrently with fs.stat easily exhausts OS file descriptors (causing EMFILE errors), crashing the app on large codebases.
📊 Impact: Eliminates OS-level crash risks and reduces memory spikes during snapshot generation by limiting concurrent open file handles to 20.
🔬 Measurement: Verify by executing a snapshot save operation on a large directory; it will complete safely rather than crashing Node.js with EMFILE: too many open files.


PR created automatically by Jules for task 14400971778822315170 started by @iotserver24

@google-labs-jules
Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@xibe-review
Copy link
Copy Markdown

xibe-review Bot commented May 16, 2026

Hey @iotserver24! 👋

I'll go through the changes and help you out with an automated review! 🔍

Starting the review now...

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

Warning

Rate limit exceeded

@iotserver24 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 59 minutes and 34 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b56ff928-6ebd-4cee-a5f4-f23f74d864f5

📥 Commits

Reviewing files that changed from the base of the PR and between 151c75e and a98bb74.

📒 Files selected for processing (1)
  • packages/core/src/file-history.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bolt-bounded-concurrency-file-history-14400971778822315170

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@xibe-review
Copy link
Copy Markdown

xibe-review Bot commented May 16, 2026

🔍 Code Review for @iotserver24

Thank you for submitting this PR. The change is well-motivated and correctly addresses the root cause of EMFILE crashes by chunking Promise.all calls. The review below focuses first on security, then addresses the functionality you asked about, and finally gives actionable recommendations and a comprehensive analysis.


🔴 Security Concerns

  • No new security vulnerabilities introduced.
    The diff does not add any hardcoded credentials, API keys, or secrets. File system calls (stat, createBackup, etc.) are reused from the original code and are not exposed to injection or authorization bypasses. The change purely modifies concurrency control.

  • However, one existing pattern could be hardened:
    The catch {} on line ~210 silently swallows all stat errors (including permission denied, disk full, etc.). This is not introduced by this PR, but it is a risk that a malicious file or permissions issue could be hidden. Consider logging a warning for unexpected errors.


🎯 User-Specific Analysis (Addressing what you asked about)

You asked for a change that prevents EMFILE by limiting concurrent file handles to 20. The implementation achieves this exactly:

  • Correct chunking logic: uses for (let i = 0; i < allFiles.length; i += CONCURRENCY_LIMIT) and Promise.all(chunk.map(...)).
  • Constant chosen appropriately: 20 concurrent fs.stat calls is a safe default for most OS environments (common hard/soft limits are 256–1024).
  • Minimal diff: Only the outer loop structure changed; inner logic is untouched, reducing risk.

Potential improvements for your specific use case:

  • The hardcoded CONCURRENCY_LIMIT = 20 may need tuning for containerized environments with very low file-descriptor limits (e.g., ulimit -n = 128) or very high I/O subsystems (SSD arrays). Consider making it configurable via an environment variable or module-level constant.

  • Missing error isolation: If any single file operation inside a chunk throws (e.g., checkOriginFileChanged or createBackup), the entire chunk’s Promise.all rejects, aborting the snapshot for all remaining files. This is a regression from the original code (which also had this problem, but with all files). You should wrap each operation in a try-catch so a single failure doesn’t block the whole chunk.


Recommended Changes

1. Move concurrency limit to a module-level constant (optional but better for maintainability)

// packages/core/src/file-history.ts
const CONCURRENCY_LIMIT = 20; // ⚡ Adjustable: max concurrent fs.stat calls for EMFILE prevention

Then use it in the function.

2. Isolate per-file failures with try-catch

File: packages/core/src/file-history.ts (around line 201–235)

Current code (simplified):

await Promise.all(
  chunk.map(async (trackingPath) => {
    // ... operations that may throw ...
  })
);

Suggested fix:

await Promise.all(
  chunk.map(async (trackingPath) => {
    try {
      // ... existing logic ...
    } catch (err) {
      // Log error and continue – prevents one failure from aborting the chunk
      console.error(`Failed to backup ${trackingPath}:`, err);
      // Optionally store the error for later reporting
    }
  })
);

3. Improve the stat error suppression (existing code, but now more impactful)

File: line ~210

Current:

try { fileStats = await stat(trackingPath); } catch { fileStats = undefined; }

Suggested:

try {
  fileStats = await stat(trackingPath);
} catch (err) {
  // Log non-ENOENT errors so silent failures are visible
  if ((err as NodeJS.ErrnoException).code !== 'ENOENT') {
    console.warn(`stat failed for ${trackingPath}: ${err.message} (code: ${(err as NodeJS.ErrnoException).code})`);
  }
  fileStats = undefined;
}

📊 Comprehensive Analysis

Functionality

  • The chunked concurrency correctly reduces simultaneous fs.stat calls from thousands to 20.
  • The snapshot creation logic remains unchanged inside the chunk – no behavioral regression expected.

Performance

  • Memory: Reduced peak memory usage because fewer promises and file handles are in flight.
  • Throughput: Chunking introduces a minor serialization overhead, but for I/O-bound tasks the bottleneck is usually disk latency, not parallelism. 20 concurrent operations is often optimal.

Maintainability

  • The code is easy to understand and document (the comment explains the reason).
  • Hardcoding the concurrency limit inside the function is a minor debt – consider making it a named constant or configurable.

Testability

  • The change does not affect the function’s signature, so existing tests should pass.
  • Consider adding a unit test that simulates a large number of files to verify chunking behavior (mock stat to resolve slowly).

📎 Summary

Aspect Status
Security ✅ No new vulnerabilities introduced
Bug Fix ✅ Effectively prevents EMFILE crashes
Error Handling ⚠️ Existing silent catch and missing per-file error isolation could cause silent data loss or incomplete snapshots
Code Quality ✅ Minimal, clear diff – easy to review
Configurability ⚠️ Hardcoded concurrency limit – consider making it a module constant

Recommendation: ⚠️ Request changes – the two error-handling improvements above are low-cost changes that make the code more robust. Without them, a single transient failure (e.g., permission denied on one file) can abort the entire snapshot, which is worse than the original unbounded concurrency (which would also abort everything, but now with fewer files processed before failure). After applying those fixes, this PR is ready to approve.


🤖 Powered by Xibe AI • Auto-generated
📊 Analysis: 2896 characters analyzed across 1 file
💙 Real-time Analytics📚 Documentation

@iotserver24 iotserver24 marked this pull request as ready for review May 17, 2026 06:19
Copilot AI review requested due to automatic review settings May 17, 2026 06:19
@iotserver24 iotserver24 merged commit 950b462 into main May 17, 2026
5 checks passed
@iotserver24 iotserver24 deleted the bolt-bounded-concurrency-file-history-14400971778822315170 branch May 17, 2026 06:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the robustness of file-history snapshot generation in packages/core by bounding filesystem I/O concurrency during snapshot creation, preventing EMFILE: too many open files crashes on large projects.

Changes:

  • Replaced an unbounded Promise.all over all trackedFiles with chunked batching at a fixed concurrency limit (20).
  • Processes tracked files in sequential chunks, still parallelizing within each chunk to retain I/O throughput while respecting OS file descriptor limits.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants