Skip to content

Conversation

@edgarsskore
Copy link

@edgarsskore edgarsskore commented Nov 26, 2025

User description

Sheets handling, write, edit and read features. For complex operations runs nodejs scripts (created separate tool because felt appropriate, maybe makes sense to integrate with start process). Did some organizing and re-structuring that directly impacted me and also added some tech. debt comments. I already felt a bit too invasive.

Add Excel file handling support with file handler architecture

  • Implement ExcelFileHandler for .xlsx/.xls/.xlsm files using xlsx library
  • Add FileHandler interface with read(), write(), getInfo(), editRange() methods
  • Create file handler factory for automatic handler selection by extension
  • Support Excel-specific features: sheet selection, range queries, offset/length
  • Add edit_block range mode for Excel cell/range updates
  • Refactor handleGetFileInfo to be generic (handles any file type's nested data)
  • Add technical debt comments

Summary by CodeRabbit

  • New Features

    • New tool to execute Node.js code directly.
    • Full Excel support: read/write sheets, edit ranges, and include Excel files in search results.
    • Python environment detection added to system info.
  • Enhancements

    • Edit operations now support Excel range updates and improved text find/replace (fuzzy matching).
    • Read/get-info return richer, formatted results (images, sheets, metadata).
  • Tests

    • New end-to-end tests for Excel and file-handler workflows.

✏️ Tip: You can customize this high-level summary in your review settings.


CodeAnt-AI Description

Add Excel-aware file handler workflow and tests

What Changed

  • read_file/write_file/get_file_info now go through extension-aware handlers so Excel files return structured JSON, honor sheet/range/offset/length requests, and edit_block can update Excel ranges while text/images keep their existing behavior
  • Server/help text now spells out Excel-format expectations, edit_block range mode, and the node:local execution fallback, and content searches merge Excel matches when an Excel file or pattern is involved
  • Added end-to-end tests for file handlers and Excel workflows covering handler selection, multi-sheet read/write, offset/range reads, edits, metadata, append mode, and fallback behavior to lock in the user-visible behavior

Impact

✅ Excel sheets readable with sheet/range pagination
✅ Excel ranges editable via edit_block
✅ Excel-aware searches surface matches when requested

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

  - Implement ExcelFileHandler for .xlsx/.xls/.xlsm files using xlsx library
  - Add FileHandler interface with read(), write(), getInfo(), editRange() methods
  - Create file handler factory for automatic handler selection by extension
  - Support Excel-specific features: sheet selection, range queries, offset/length
  - Add edit_block range mode for Excel cell/range updates
  - Refactor handleGetFileInfo to be generic (handles any file type's nested data)
  - Remove .describe() calls from schemas (WriteFileArgsSchema, EditBlockArgsSchema)
  - Add technical debt comments documenting text edit architectural inconsistency
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Nov 26, 2025

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 26, 2025

Walkthrough

Adds a file-handler abstraction (text, excel, image, binary) with a factory; refactors filesystem tools to use handlers and new ReadOptions/FileResult types; integrates Excel search; adds ExcelJS dependency and tests; implements virtual Node sessions/execute_node fallback; enhances process tooling and Python detection; updates schemas and tool descriptions.

Changes

Cohort / File(s) Summary
Manifest & Dependencies
manifest.template.json, package.json
Added public execute_node tool; revised edit_block description to document Excel range edits and fuzzy text matching; added exceljs@^4.4.0.
File Handler Base & Exports
src/utils/files/base.ts, src/utils/files/index.ts
New FileHandler API, types (ReadOptions, FileResult, FileMetadata, EditResult, FileInfo, ExcelSheet); consolidated exports barrel.
File Handler Factory
src/utils/files/factory.ts
New lazy factory and utilities: getFileHandler, isExcelFile, isImageFile, handler priority ordering.
Concrete File Handlers
src/utils/files/text.ts, src/utils/files/excel.ts, src/utils/files/image.ts, src/utils/files/binary.ts
Added TextFileHandler (smart reads, tail/offset strategies), ExcelFileHandler (read/write/editRange, sheets/ranges, 10MB limit, formula handling), ImageFileHandler (base64/mime handling), BinaryFileHandler (instructional read, write blocked).
Filesystem Integration
src/tools/filesystem.ts, src/handlers/filesystem-handlers.ts
Refactored read/write/getInfo to use FileHandler via ReadOptions/FileResult; normalized metadata usage (metadata.isImage/isExcelFile); added formatting helpers and consistent content stringification.
Edit Tool & Schemas
src/tools/edit.ts, src/tools/schemas.ts
Edit block now supports structured range edits via handler.editRange (range+content) and legacy text find/replace (old_string/new_string); schemas require one path.
Search Manager
src/search-manager.ts
Added Excel-aware search (searchExcelFiles, findExcelFiles), merges Excel results with ripgrep stream, and option to mark session.wasIncomplete.
Process / Node Fallback & System Info
src/tools/improved-process-tools.ts, src/utils/system-info.ts
Added executeNodeCode and virtual node:local sessions with routing in startProcess/interact/read/terminate; added Python detection and pythonInfo in SystemInfo.
Server Tool Descriptions
src/server.ts
Updated tool documentation to describe Excel/image formats, edit_block behavior, and node:local semantics.
Tests & Artifacts
test/test-file-handlers.js, test/test-excel-files.js, test/test_output/node_repl_debug.txt
Added comprehensive file-handler and Excel tests; removed obsolete test artifact.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

  • Focus areas:
    • src/utils/files/excel.ts — range parsing, pagination, formula handling, 10MB enforcement.
    • src/utils/files/text.ts — multiple read strategies, negative-offset/tail reads, performance heuristics.
    • src/tools/filesystem.ts & src/handlers/filesystem-handlers.ts — correct propagation of ReadOptions/FileResult and backwards compatibility.
    • src/tools/edit.ts — handler selection, editRange paths and error handling, JSON parsing fallback.
    • src/search-manager.ts — concurrency, merging ripgrep and Excel results, and wasIncomplete semantics.
    • src/tools/improved-process-tools.ts — virtual Node session lifecycle, temp file cleanup, and routing edge cases.
    • Tests — determinism, environment assumptions, and artifact cleanup.

Possibly related PRs

Suggested reviewers

  • serg33v
  • dmitry-ottic-ai

Poem

🐰 I hop through sheets and rows so bright,

Cells and handlers glow in my light.
I nibble ranges, chew old strings away,
Spawn a Node, then scamper to play.
Hooray — new tools, a carrot day! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feature/sheets handling' accurately captures the main objective of adding comprehensive Excel file (sheets) handling with a file-handler architecture, directly reflecting the core feature being introduced.
Docstring Coverage ✅ Passed Docstring coverage is 88.33% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codeant-ai codeant-ai bot added the size:XXL This PR changes 1000+ lines, ignoring generated files label Nov 26, 2025
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Nov 26, 2025

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Image payload
    Converting fileResult.content to a UTF‑8 string when building the image response corrupts the binary data that downstream consumers expect. The handler should either pass the raw buffer or explicitly encode it (e.g., base64) and document that, instead of calling toString() on the image bytes.

  • File info completeness
    getFileInfo now depends entirely on FileHandler.getInfo for all the legacy fields we used to build from fs.stat (size, timestamps, permissions, directory/file flags, etc.). If any handler (e.g., the default text handler) fails to populate those properties the consumer of getFileInfo will suddenly receive undefined values and we will lose the guarantees that the previous implementation provided. Please validate that every handler always emits the same base stats or add a local fs.stat fallback before returning the legacy info structure.

  • Blocking Excel search
    The new Excel search is awaited before startSearch returns, so when there are many Excel files or large workbooks the initial response waits for the entire Excel scan to complete. This makes searches feel slow even though text results may already be streaming. Consider decoupling the Excel scan from the initial response so startSearch can return as soon as the first ripgrep chunk is ready while Excel matches are merged later through readSearchResults.

  • Config leak
    setup() mutates allowedDirectories and returns the original config, but if any step after fetching the config (e.g. setValue or fs.mkdir) throws, runTests() never receives a originalConfig to pass to teardown(), so the mutated config is never restored. Consider restoring in a finally inside setup or ensuring runTests always calls teardown with the saved config even when setup fails, so the global config does not get stuck pointing only at the test directory.

  • Guaranteed Teardown
    The finally block in runTests only calls teardown when originalConfig is truthy, but setup calls happen before originalConfig is assigned. If configManager.getConfig() throws (or anything else fails during setup), originalConfig stays undefined and we never invoke teardown, so the test directory remains on disk and allowedDirectories is left pointing to it. teardown needs to run regardless of whether originalConfig was populated, with appropriate guards inside it.

@codeant-ai
Copy link
Contributor

codeant-ai bot commented Nov 26, 2025

CodeAnt AI finished reviewing your PR.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (22)
src/tools/schemas.ts (1)

88-102: Schema refine logic is correct, but consider stricter typing for content.

The refine validation correctly enforces that either text replacement fields or range-based fields must be provided together. However, content: z.any() is very permissive and provides no runtime validation.

Consider adding a more specific type that covers the expected content shapes (arrays, objects, primitives) while still allowing flexibility:

-  content: z.any().optional(),
+  content: z.union([
+    z.array(z.array(z.any())),  // 2D array for Excel data
+    z.record(z.any()),          // Object with sheet names as keys
+    z.string(),
+    z.number(),
+  ]).optional(),

This provides documentation value and catches obvious misuse while maintaining flexibility for Excel data structures.

test/test-file-handlers.js (1)

210-218: Fragile status message removal pattern.

The regex ^\[.*?\]\n\n to strip status messages is brittle and could break if the status message format changes. Consider using a more robust approach:

-  const parsed = JSON.parse(readContent.replace(/^\[.*?\]\n\n/, '')); // Remove status message
+  // Find where the JSON actually starts (skip any status prefix)
+  const jsonStart = readContent.indexOf('{');
+  const parsed = JSON.parse(readContent.substring(jsonStart));

Alternatively, consider adding an option to suppress status messages in read operations for testing scenarios.

src/tools/edit.ts (2)

374-417: The editRange result is not used.

The editRange method returns an EditResult with success, editsApplied, and errors fields, but this result is discarded. Consider using the result to provide more detailed feedback:

-            if ('editRange' in handler && typeof handler.editRange === 'function') {
-                await handler.editRange(parsed.file_path, parsed.range, content, parsed.options);
-                return {
-                    content: [{
-                        type: "text",
-                        text: `Successfully updated range ${parsed.range} in ${parsed.file_path}`
-                    }],
-                };
+            if ('editRange' in handler && typeof handler.editRange === 'function') {
+                const result = await handler.editRange(parsed.file_path, parsed.range, content, parsed.options);
+                if (!result.success) {
+                    const errorMessages = result.errors?.map(e => `${e.location}: ${e.error}`).join('\n') || 'Unknown error';
+                    return {
+                        content: [{
+                            type: "text",
+                            text: `Failed to update range ${parsed.range}: ${errorMessages}`
+                        }],
+                        isError: true
+                    };
+                }
+                return {
+                    content: [{
+                        type: "text",
+                        text: `Successfully applied ${result.editsApplied} edit(s) to range ${parsed.range} in ${parsed.file_path}`
+                    }],
+                };

This provides better user feedback and properly handles partial failures reported by the handler.


376-377: Consider static import for better performance.

The dynamic import of getFileHandler adds latency to every range-based edit call. Since this is a core feature, consider using a static import at the top of the file:

+import { getFileHandler } from '../utils/files/factory.js';
 
 // ... later in handleEditBlock ...
-            const { getFileHandler } = await import('../utils/files/factory.js');
-            const handler = getFileHandler(parsed.file_path);
+            const handler = getFileHandler(parsed.file_path);

Dynamic imports are better suited for rarely-used code paths or optional features, not core functionality.

src/utils/files/factory.ts (2)

49-61: Consider using Array.find() for cleaner iteration.

The loop logic is correct, but could be more idiomatic.

 export function getFileHandler(path: string): FileHandler {
     const allHandlers = initializeHandlers();
 
-    // Try each handler in order
-    for (const handler of allHandlers) {
-        if (handler.canHandle(path)) {
-            return handler;
-        }
-    }
-
-    // Fallback to binary handler (should never reach here due to binary catch-all)
-    return allHandlers[allHandlers.length - 1];
+    // Find first handler that can handle this file type
+    // BinaryFileHandler.canHandle() always returns true, so this never returns undefined
+    return allHandlers.find(h => h.canHandle(path)) ?? allHandlers[allHandlers.length - 1];
 }

68-89: Potential duplication with handler canHandle() methods.

Both isExcelFile/isImageFile helper functions and the handler canHandle() methods perform extension checking. Consider consolidating to avoid drift. For now, this works correctly.

One option is to delegate to the handler:

export function isExcelFile(path: string): boolean {
    return new ExcelFileHandler().canHandle(path);
}

Or extract shared extension lists to a constants file. Low priority since current implementation is functional.

src/handlers/node-handlers.ts (2)

24-27: Consider using crypto.randomUUID() for stronger temp file uniqueness.

The current naming uses Date.now() + Math.random(), which is generally sufficient but could theoretically collide under high concurrency.

+import { randomUUID } from 'crypto';
+
 // Create temp file IN THE MCP DIRECTORY so ES module imports resolve correctly
-const tempFile = path.join(mcpRoot, `.mcp-exec-${Date.now()}-${Math.random().toString(36).slice(2)}.mjs`);
+const tempFile = path.join(mcpRoot, `.mcp-exec-${randomUUID()}.mjs`);

32-56: Timeout produces generic exit code, not explicit timeout message.

When spawn's timeout triggers, it sends SIGTERM and the process exits with code 143 (on Unix) or similar. The user sees "Execution failed (exit code 143)" rather than a clear timeout indication.

Consider tracking timeout explicitly:

         const result = await new Promise<{ stdout: string; stderr: string; exitCode: number }>((resolve) => {
+            let timedOut = false;
             const proc = spawn(process.execPath, [tempFile], {
                 cwd: mcpRoot,
                 timeout: timeout_ms
             });

             let stdout = '';
             let stderr = '';

             proc.stdout.on('data', (data) => {
                 stdout += data.toString();
             });

             proc.stderr.on('data', (data) => {
                 stderr += data.toString();
             });

             proc.on('close', (exitCode) => {
-                resolve({ stdout, stderr, exitCode: exitCode ?? 1 });
+                resolve({ 
+                    stdout, 
+                    stderr: timedOut ? `Execution timed out after ${timeout_ms}ms\n${stderr}` : stderr, 
+                    exitCode: exitCode ?? 1 
+                });
             });

             proc.on('error', (err) => {
+                if (err.message.includes('ETIMEDOUT') || err.message.includes('killed')) {
+                    timedOut = true;
+                }
                 resolve({ stdout, stderr: stderr + '\n' + err.message, exitCode: 1 });
             });
         });
src/handlers/filesystem-handlers.ts (1)

96-106: Timeout error handling could be more descriptive.

When withTimeout returns null, the error message 'Failed to read the file' doesn't indicate that a timeout occurred. Consider making this clearer:

     if (result == null) {
-        // Handles the impossible case where withTimeout resolves to null instead of throwing
-        throw new Error('Failed to read the file');
+        throw new Error('Read file operation timed out after 60 seconds');
     }
test/test-excel-files.js (4)

180-186: Strengthen the range exclusion assertion.

The assertion !content.includes('City') || content.split('City').length === 1 is logically equivalent to !content.includes('City') (the second condition is always false if the first is false). Consider simplifying:

-  // City is column C, should NOT be included
-  assert.ok(!content.includes('City') || content.split('City').length === 1,
-    'Range A1:B2 should not include City column');
+  // City is column C, should NOT be included
+  assert.ok(!content.includes('City'),
+    'Range A1:B2 should not include City column');

314-322: Negative offset test assertion is too permissive.

The assertion content.includes('Fourth') || content.includes('Fifth') will pass even if only one of the expected rows is present. For offset: -2, both the fourth and fifth rows should be included:

-  assert.ok(content.includes('Fourth') || content.includes('Fifth'),
-    'Should include data from last rows');
+  assert.ok(content.includes('Fourth'), 'Should include Fourth (second to last row)');
+  assert.ok(content.includes('Fifth'), 'Should include Fifth (last row)');

343-359: Consider adding error case tests.

The test suite covers happy paths well but lacks tests for error scenarios such as:

  • Reading a non-existent Excel file
  • Writing to a read-only location
  • Reading with an invalid sheet name/index
  • Corrupted Excel file handling

These would improve confidence in error handling robustness.

Would you like me to help generate error case tests?


86-87: Constructor name check may break with minification.

Using handler.constructor.name === 'ExcelFileHandler' is fragile as minifiers can rename classes. Consider using instanceof if the class is exported, or add a handler type identifier property:

// Alternative approach using instanceof
import { ExcelFileHandler } from '../dist/utils/files/excel.js';
assert.ok(handler instanceof ExcelFileHandler, ...);

For tests against the dist folder, minification is less likely an issue, but this is worth noting for future-proofing.

src/utils/files/image.ts (3)

38-51: Consider adding size limit or chunked reading for large images.

Images are read entirely into memory and base64-encoded, which increases memory usage by ~33%. For very large images (e.g., 100MB+ RAW files), this could cause memory pressure. The current implementation is fine for typical use cases, but consider documenting this limitation or adding a size check.


53-61: Add validation for base64 string content.

When content is a string, the code assumes it's base64-encoded. If a non-base64 string is passed, Buffer.from(content, 'base64') will silently produce garbage output. Consider adding validation:

 async write(path: string, content: Buffer | string): Promise<void> {
     // If content is base64 string, convert to buffer
     if (typeof content === 'string') {
+        // Validate base64 format
+        if (!/^[A-Za-z0-9+/]*={0,2}$/.test(content)) {
+            throw new Error('Invalid base64 string content for image write');
+        }
         const buffer = Buffer.from(content, 'base64');
         await fs.writeFile(path, buffer);
     } else {
         await fs.writeFile(path, content);
     }
 }

Alternatively, document that the caller must ensure valid base64 input.


84-92: Minor: getMimeType could use the static map directly.

The loop iterates over entries when you could use a direct lookup:

 private getMimeType(path: string): string {
     const lowerPath = path.toLowerCase();
-    for (const [ext, mimeType] of Object.entries(ImageFileHandler.IMAGE_MIME_TYPES)) {
-        if (lowerPath.endsWith(ext)) {
-            return mimeType;
-        }
-    }
+    const ext = '.' + lowerPath.split('.').pop();
+    return ImageFileHandler.IMAGE_MIME_TYPES[ext] ?? 'application/octet-stream';
-    return 'application/octet-stream'; // Fallback
 }

The current implementation works correctly; this is just a minor optimization.

src/search-manager.ts (1)

369-395: Potential performance issue with row.eachCell callback.

The eachRow callback with nested eachCell operations could be slow for large workbooks with many sheets. Consider:

  1. Adding an early exit when maxResults is reached at the worksheet level
  2. Processing worksheets in parallel for very large files

Current implementation is functional but may have performance implications for large Excel files.

src/utils/files/text.ts (3)

42-46: Clarify fallback handler semantics in canHandle.

canHandle always returns true, making this the fallback handler. This design relies on the factory checking other handlers first. Consider adding a comment to clarify this dependency or throwing if the path is a known non-text format:

 canHandle(path: string): boolean {
-    // Text handler is the default - handles most files
-    // Only returns false for known non-text formats (checked by other handlers)
+    // Text handler is the fallback - handles any file not claimed by other handlers.
+    // The factory (src/utils/files/factory.ts) must check specialized handlers first.
     return true;
 }

26-36: Constants duplication - consider importing from a shared location.

As noted in the comments, these constants are duplicated from filesystem.ts. Consider extracting them to a shared constants file to maintain DRY principle:

// src/utils/files/constants.ts
export const FILE_SIZE_LIMITS = { ... } as const;
export const READ_PERFORMANCE_THRESHOLDS = { ... } as const;

This is acknowledged technical debt but worth tracking.


370-440: Byte estimation may skip lines in edge cases.

The readFromEstimatedPosition method estimates byte position based on average line length from a sample. If the file has highly variable line lengths (e.g., short header followed by long data rows), the estimated position could be significantly off. The method handles this by skipping the first partial line, but the actual line offset displayed to the user via generateEnhancedStatusMessage will be the requested offset, not the actual offset.

Consider adding a note to the status message when using estimation:

 const content = includeStatusMessage
-    ? `${this.generateEnhancedStatusMessage(result.length, offset, fileTotalLines, false)}\n\n${result.join('\n')}`
+    ? `${this.generateEnhancedStatusMessage(result.length, offset, fileTotalLines, false)} (estimated position)\n\n${result.join('\n')}`
     : result.join('\n');
src/utils/files/excel.ts (2)

77-136: Consider extracting ENOENT check to a helper.

The ENOENT check pattern is duplicated at lines 83-86 and 131-134. Consider extracting to a small helper for clarity.

+private isFileNotFoundError(error: unknown): boolean {
+    return (error as any).code === 'ENOENT' ||
+        (error instanceof Error && error.message.includes('ENOENT'));
+}
+
 async write(path: string, content: any, mode?: 'rewrite' | 'append'): Promise<void> {
     // Check existing file size if it exists
     try {
         await this.checkFileSize(path);
     } catch (error) {
         // File doesn't exist - that's fine for write
-        if ((error as any).code !== 'ENOENT' &&
-            !(error instanceof Error && error.message.includes('ENOENT'))) {
+        if (!this.isFileNotFoundError(error)) {
             throw error;
         }
     }

284-309: Minor: redundant fs.stat call in extractMetadata.

extractMetadata performs an additional fs.stat call, but the file size was already checked in checkFileSize. Consider caching or passing the stats to avoid the extra I/O, though this is a minor optimization given the file size limit.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d3f5824 and 4354153.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (20)
  • manifest.template.json (2 hunks)
  • package.json (1 hunks)
  • src/handlers/filesystem-handlers.ts (3 hunks)
  • src/handlers/index.ts (1 hunks)
  • src/handlers/node-handlers.ts (1 hunks)
  • src/search-manager.ts (4 hunks)
  • src/server.ts (8 hunks)
  • src/tools/edit.ts (2 hunks)
  • src/tools/filesystem.ts (13 hunks)
  • src/tools/schemas.ts (3 hunks)
  • src/utils/files/base.ts (1 hunks)
  • src/utils/files/binary.ts (1 hunks)
  • src/utils/files/excel.ts (1 hunks)
  • src/utils/files/factory.ts (1 hunks)
  • src/utils/files/image.ts (1 hunks)
  • src/utils/files/index.ts (1 hunks)
  • src/utils/files/text.ts (1 hunks)
  • test/test-excel-files.js (1 hunks)
  • test/test-file-handlers.js (1 hunks)
  • test/test_output/node_repl_debug.txt (0 hunks)
💤 Files with no reviewable changes (1)
  • test/test_output/node_repl_debug.txt
🧰 Additional context used
🧬 Code graph analysis (13)
test/test-file-handlers.js (3)
src/config-manager.ts (1)
  • configManager (227-227)
src/utils/files/factory.ts (1)
  • getFileHandler (49-61)
src/tools/filesystem.ts (3)
  • readFile (725-733)
  • writeFile (824-847)
  • getFileInfo (1077-1125)
src/utils/files/binary.ts (1)
src/utils/files/base.ts (4)
  • FileHandler (13-64)
  • ReadOptions (73-91)
  • FileResult (96-105)
  • FileInfo (173-200)
src/utils/files/text.ts (2)
src/utils/files/index.ts (1)
  • TextFileHandler (13-13)
src/utils/files/base.ts (4)
  • FileHandler (13-64)
  • ReadOptions (73-91)
  • FileResult (96-105)
  • FileInfo (173-200)
src/utils/files/image.ts (1)
src/utils/files/base.ts (4)
  • FileHandler (13-64)
  • ReadOptions (73-91)
  • FileResult (96-105)
  • FileInfo (173-200)
src/handlers/node-handlers.ts (2)
src/types.ts (1)
  • ServerResult (73-77)
src/tools/schemas.ts (1)
  • ExecuteNodeArgsSchema (168-171)
src/utils/files/factory.ts (5)
src/utils/files/base.ts (1)
  • FileHandler (13-64)
src/utils/files/excel.ts (1)
  • ExcelFileHandler (33-485)
src/utils/files/image.ts (1)
  • ImageFileHandler (18-93)
src/utils/files/text.ts (1)
  • TextFileHandler (41-441)
src/utils/files/binary.ts (1)
  • BinaryFileHandler (20-173)
src/handlers/filesystem-handlers.ts (2)
src/utils/files/base.ts (1)
  • ReadOptions (73-91)
src/tools/filesystem.ts (2)
  • readFile (725-733)
  • getFileInfo (1077-1125)
src/utils/files/base.ts (3)
src/tools/filesystem.ts (1)
  • FileResult (286-286)
test/test-excel-files.js (6)
  • content (121-121)
  • content (175-175)
  • content (196-196)
  • content (233-233)
  • content (288-288)
  • content (316-316)
test/test-file-handlers.js (3)
  • content (184-184)
  • content (206-206)
  • content (255-255)
src/tools/edit.ts (2)
src/tools/schemas.ts (1)
  • EditBlockArgsSchema (88-102)
src/utils/files/factory.ts (1)
  • getFileHandler (49-61)
src/server.ts (1)
src/tools/schemas.ts (2)
  • EditBlockArgsSchema (88-102)
  • ExecuteNodeArgsSchema (168-171)
src/search-manager.ts (1)
src/utils/files/factory.ts (1)
  • isExcelFile (68-71)
src/utils/files/excel.ts (1)
src/utils/files/base.ts (6)
  • ExcelSheet (134-143)
  • FileHandler (13-64)
  • ReadOptions (73-91)
  • FileResult (96-105)
  • EditResult (152-164)
  • FileInfo (173-200)
src/tools/filesystem.ts (3)
src/utils/files/base.ts (2)
  • ReadOptions (73-91)
  • FileResult (96-105)
src/utils/files/factory.ts (1)
  • getFileHandler (49-61)
src/utils/files/index.ts (1)
  • getFileHandler (10-10)
🪛 ast-grep (0.40.0)
src/search-manager.ts

[warning] 324-324: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(pattern, flags)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 328-328: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(escaped, flags)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 343-343: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${regexPat}$, 'i')
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (57)
manifest.template.json (2)

91-91: LGTM!

The updated description accurately reflects the new dual-mode functionality of edit_block, supporting both text replacement with fuzzy matching and range-based updates for Excel files.


137-140: LGTM!

The new execute_node tool entry is properly declared with an appropriate description covering ES module support and top-level await.

src/tools/schemas.ts (2)

50-52: LGTM!

The new sheet, range, and options fields properly extend the read file schema to support Excel-specific operations. The sheet union type correctly allows both string (sheet name) and number (sheet index) references.


167-171: LGTM!

The ExecuteNodeArgsSchema is well-defined with a sensible 30-second default timeout. The schema aligns with the handler implementation in node-handlers.ts.

src/handlers/index.ts (1)

8-8: LGTM!

The new re-export follows the established pattern and properly exposes the node-handlers module through the central handlers index.

test/test-file-handlers.js (2)

288-303: LGTM!

The exported runTests function properly handles setup/teardown lifecycle, error handling, and returns a boolean status for integration with test runners.


238-238: Unable to complete verification due to repository access limitations.

Unfortunately, I'm unable to clone or access the repository code to verify the concerns raised in your review comment. While the repository wonderwhy-er/DesktopCommanderMCP is confirmed to be public, I cannot inspect the codebase to:

  • Locate the getFileInfo implementation
  • Check the FileInfo type definition
  • Determine if the string 'true' is actually returned by the API
  • Confirm whether this type inconsistency is intentional (backward compatibility) or unintentional (bug)

The review comment's core concern remains valid: The assertion info.isFile === true || info.isFile === 'true' does suggest potential type inconsistency that should be resolved either by:

  1. Fixing the source to consistently return boolean true, or
  2. Adding documentation explaining why both types are acceptable
src/tools/edit.ts (2)

1-16: Good architectural documentation.

The technical debt comment clearly documents the current inconsistency and provides a clear path for future refactoring. This is valuable for maintainability.


420-423: Non-null assertions are safe here due to schema refinement.

The ! assertions on old_string and new_string are valid because the schema's refine validation guarantees that when range/content are not provided, old_string and new_string must both be present. The control flow reaching this point ensures the values exist.

package.json (1)

86-86: Check transitive dependencies of exceljs for known vulnerabilities.

The exceljs 4.4.0 library has no direct CVEs, but multiple security advisories exist for its transitive dependencies (inflight, brace-expansion, lodash, tmp via archiver/glob). These issues remain open in the exceljs repository as of January 2025. Run npm audit on your project to identify the actual exposure in your dependency tree and consider:

  • Using npm/yarn overrides to patch vulnerable transitive deps
  • Monitoring the exceljs repository for updated releases
  • Evaluating alternative Excel libraries if transitive risk is unacceptable
src/utils/files/factory.ts (1)

12-32: LGTM - Clean singleton pattern with correct priority ordering.

The lazy initialization ensures handlers are only created once, and the priority order correctly places more specific handlers (Excel, Image) before the catch-all (Text, Binary).

src/server.ts (4)

265-272: LGTM - Clear documentation of format-specific handling.

The FORMAT HANDLING section provides clear guidance on how different file types are processed, including Excel-specific parameters (sheet, range) and image encoding.


328-332: LGTM - Excel write format documentation is clear.

The example JSON format '[["Name","Age"],["Alice",30]]' clearly shows the expected 2D array structure for Excel writes.


1206-1217: LGTM - Handler follows established patterns.

Error handling is consistent with other tool handlers, capturing errors and returning structured ServerResult with isError: true.


992-1020: Unable to verify security safeguards due to repository access limitations.

The repository failed to clone, preventing verification of the codebase for existing security patterns (allowedDirectories, validatePath, sandbox, etc.) and the actual implementation details of the execute_node tool.

The original review comment raises valid security concerns about arbitrary code execution with MCP server privileges. However, without access to:

  • The actual implementation of execute_node (including any runtime restrictions)
  • Security patterns used by other tools in the codebase
  • Runtime environment configuration and privilege restrictions
  • Relevant security design documentation or comments

I cannot definitively verify whether additional safeguards are needed or if this is an intentional design decision.

src/utils/files/index.ts (1)

1-16: LGTM - Well-organized barrel exports.

The export structure is clean and follows a logical order: base interfaces first, then factory utilities, then concrete handler implementations.

src/handlers/node-handlers.ts (2)

58-59: Silent cleanup failure is appropriate here.

The .catch(() => {}) pattern for cleanup is correct since we don't want cleanup failures to mask the actual execution result.


61-76: LGTM - Error and success paths return proper ServerResult structure.

The handler correctly distinguishes between execution failure (non-zero exit) and success, with appropriate isError flag and content formatting.

src/utils/files/binary.ts (3)

20-24: LGTM - Correct catch-all semantics.

canHandle returning true unconditionally is the correct behavior for a fallback handler that should process anything not handled by more specific handlers.


26-36: LGTM - Instructional response pattern is well-designed.

Instead of attempting to read binary content (which would be meaningless), the handler returns actionable guidance. The isBinary: true metadata allows callers to detect this case.


154-172: LGTM - Reasonable MIME type mapping with safe fallback.

The mapping covers common binary types, and the application/octet-stream fallback is the correct choice for unknown extensions.

src/handlers/filesystem-handlers.ts (3)

63-70: LGTM - Clean ReadOptions consolidation.

The refactored approach using a single ReadOptions object is cleaner than multiple parameters and aligns well with the interface defined in src/utils/files/base.ts.


72-92: Verify Buffer handling for non-image file content.

When fileResult.content is a Buffer (as defined in FileResult), calling .toString() without encoding defaults to UTF-8, which is appropriate for text. However, if a binary file somehow reaches this path, the output could be garbled.

The current flow relies on the handler system routing binary files correctly, which should be fine given the factory pattern. The implementation looks correct.


248-273: Consider recursive handling for deeply nested objects.

The formatValue function handles one level of nesting (arrays of objects), but JSON.stringify is used for nested objects at line 270. This is reasonable for most cases, but for consistency with array formatting, you might want to add a maxDepth parameter if deeply nested structures become common.

Current implementation is acceptable for the stated use cases.

test/test-excel-files.js (1)

34-42: LGTM - Proper cleanup with ENOENT handling.

The cleanup function correctly handles the case where the directory doesn't exist, preventing spurious errors during initial cleanup.

src/utils/files/image.ts (2)

33-36: LGTM - Clean extension checking.

The canHandle method correctly uses case-insensitive matching for file extensions.


63-79: LGTM - File info implementation is correct.

The getInfo method properly extracts file stats and returns appropriate metadata with the isImage flag.

src/search-manager.ts (3)

149-168: LGTM - Clean parallel Excel search integration.

The Excel search runs concurrently with ripgrep and errors are caught without failing the overall search. Results are correctly added to the session.


428-465: LGTM - Recursive file discovery is well-implemented.

The findExcelFiles method correctly:

  • Skips hidden directories and node_modules
  • Handles both file and directory root paths
  • Silently handles permission errors

470-482: LGTM - Context extraction utility.

The getMatchContext method provides reasonable context around matches with proper boundary handling and ellipsis indicators.

src/utils/files/text.ts (4)

238-282: LGTM - Efficient reverse reading implementation.

The readLastNLinesReverse method efficiently reads from the end of large files using chunked buffer reads. The file descriptor is properly closed in the finally block.


287-326: Circular buffer implementation is correct.

The readFromEndWithReadline uses a circular buffer pattern to efficiently capture the last N lines without storing all lines in memory. The reconstruction logic at lines 311-319 correctly handles both cases (total lines >= requested and < requested).


71-98: LGTM - getInfo implementation with line count optimization.

The implementation correctly avoids counting lines for large files and handles read errors gracefully.


48-61: I'll verify the factory handler order to confirm whether the binary file check timing is actually an issue. Let me search for the handler factory implementation.
<function_calls>

Search for the file handler factory function

#!/bin/bash

Find factory function and handler implementations

rg -n "getFileHandler|class.*FileHandler" --type=ts -A 5 -B 2 | head -100


</function_calls>

Search for factory pattern or handler selection logic

#!/bin/bash

Look for factory pattern and handler selection

rg -n "BinaryFileHandler|TextFileHandler" --type=ts -C 3 | head -80


</function_calls>

Find canHandle method implementation

#!/bin/bash

Search for canHandle method

rg -n "canHandle" --type=ts -A 3 -B 1


</function_calls>

src/utils/files/base.ts (6)

13-64: LGTM!

The FileHandler interface is well-designed with clear method contracts. The optional editRange method appropriately allows for structured editing in handlers that support it (like Excel), while keeping the core interface minimal for simpler handlers.


70-91: LGTM!

The ReadOptions interface provides a clean unified API for different file handlers, with Excel-specific options clearly documented.


93-129: LGTM!

The FileResult and FileMetadata interfaces provide flexible structures that accommodate different file types while maintaining a consistent API.


131-143: LGTM!

Clean and minimal interface for Excel sheet metadata.


148-164: LGTM!

The EditResult interface appropriately supports partial success scenarios with detailed error reporting per location.


170-200: LGTM!

The FileInfo interface provides comprehensive file metadata with clear separation between standard file stats and type-specific information.

src/utils/files/excel.ts (7)

1-27: LGTM!

Clean imports and reasonable file size limit for in-memory Excel processing.


35-38: LGTM!

The extension checking works correctly. The approach of lowercasing the entire path before checking endsWith is functionally equivalent to extracting and comparing the extension.


40-75: LGTM!

The read method provides a good user experience with informative headers, pagination info, and modification hints embedded in the output.


158-236: LGTM with a minor note on editsApplied.

The editRange implementation is solid with proper validation and formula support. Note that editsApplied returns 1 representing one range operation rather than individual cell count - this semantic is reasonable for range-based editing.


238-280: LGTM!

Graceful error handling - returns basic file info with error details in metadata when Excel parsing fails, rather than throwing.


311-420: LGTM!

Comprehensive implementation handling various Excel cell value types (formulas, rich text, dates, etc.) with proper pagination support including negative offsets for tail reads.


478-484: LGTM!

The columnToNumber algorithm correctly handles multi-letter columns using base-26 conversion.

src/tools/filesystem.ts (10)

11-12: LGTM!

Clean imports from the new file handler system.


285-286: LGTM!

Re-exporting FileResult maintains backward compatibility for existing consumers.


323-328: LGTM!

readFileFromUrl now returns FileResult with metadata structure consistent with the new file handler pattern.


485-485: LGTM!

Text reading helpers now return consistent FileResult with metadata: { isImage: false }.


641-701: LGTM!

The refactored readFileFromDisk cleanly delegates to the appropriate file handler while maintaining telemetry and timeout handling. The content.toString() ensures consistent string output.


725-733: LGTM!

Clean unified API with proper delegation based on isUrl flag.


842-846: LGTM!

Clean delegation to file handler for write operations.


866-868: LGTM!

Correctly adapts to the new FileResult structure with proper optional chaining.


1077-1124: LGTM!

Well-implemented backward compatibility layer that translates the new FileInfo structure to the legacy format while adding type-specific metadata.


397-445: I'll verify whether these text reading helper functions are still being used after the refactoring. Let me search the codebase for their usages.
<function_calls>


#!/bin/bash

Search for imports and usages of the text reading helper functions

First, check for usages outside of filesystem.ts

echo "=== Searching for usages of helper functions outside filesystem.ts ==="
rg -n "readFileWithSmartPositioning|readLastNLinesReverse|readFromEndWithReadline|readFromStartWithReadline|readFromEstimatedPosition" --type ts -g '!src/tools/filesystem.ts' -A 2 -B 2

echo ""
echo "=== Checking for TextFileHandler and its usage of these functions ==="
rg -n "TextFileHandler" --type ts -A 5 -B 5

echo ""
echo "=== Checking if these functions are exported from filesystem.ts ==="
rg -n "export.*readFileWithSmartPositioning|export.*readLastNLinesReverse|export.*readFromEndWithReadline|export.*readFromStartWithReadline|export.*readFromEstimatedPosition" src/tools/filesystem.ts


</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment on lines +319 to +330
const results: SearchResult[] = [];

// Build regex for matching content
const flags = ignoreCase ? 'i' : '';
let regex: RegExp;
try {
regex = new RegExp(pattern, flags);
} catch {
// If pattern is not valid regex, escape it for literal matching
const escaped = pattern.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
regex = new RegExp(escaped, flags);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Script executed:

# First, explore the repository structure and find the file under review
find . -name "search-manager.ts" -type f 2>/dev/null | head -5

Repository: wonderwhy-er/DesktopCommanderMCP


<shell_result>
Repository: wonderwhy-er/DesktopCommanderMCP
Script execution failed with unrecoverable error: [CAPABILITY:RUN_SHELL_SCRIPT] Failed to clone repository wonderwhy-er/DesktopCommanderMCP: Clone operation failed
</shell_result>


ReDoS protection Node.js regex safe-regex library


<web_search_result>
Short summary and recommendations:

  • What safe-regex is: an npm module that heuristically flags potentially catastrophic (exponential-time) regexes by limiting "star height". It can give both false positives and false negatives. [1]

  • Safer alternatives/tools:

    • vuln-regex-detector — more accurate detector (claims no false positives and far fewer false negatives; server-based option). [2]
    • redos-detector — deterministic checker/CLI/library that proves whether a pattern is safe. [3]
    • safe-regex2 — maintained fork of safe-regex with some improvements/options. [4]
  • Practical protections (use together, not just one):

    1. Static-check regexes at build time with a detector (vuln-regex-detector or redos-detector). [2][3]
    2. Prefer linear-time engines (e.g., Google RE2) or use an RE2 binding when you must accept untrusted patterns. RE2 avoids backtracking-based ReDoS. [2]
    3. Enforce input limits (max length) and application-level timeouts / run regexes inside time-limited workers or separate processes to avoid blocking the event loop.
    4. Avoid allowing user-supplied arbitrary regexes; if needed, validate them strictly and sandbox execution.
    5. Where possible, convert vulnerable patterns to safe equivalents (atomic groups, possessive quantifiers where supported, or different parsing approach).
  • Quick npm commands:

    • safe-regex: npm i safe-regex. Note its heuristic warnings. [1]
    • safe-regex2: npm i safe-regex2. [4]
    • vuln-regex-detector: npm i vuln-regex-detector. [2]
    • redos-detector: npm i redos-detector. [3]

Sources:
[1] safe-regex (npm) — heuristic checker; warns about false positives/negatives.
[2] vuln-regex-detector (npm / repo) — more accurate detector; mentions RE2 and server analysis.
[3] redos-detector (npm) — deterministic ReDoS checker/CLI/library.
[4] safe-regex2 (npm) — maintained fork of safe-regex.
</web_search_result>


user-supplied regex patterns Node.js TypeScript safe validation timeout


<web_search_result>
Short guidance + options (Node.js + TypeScript) to safely accept and run user-supplied regexes and protect against ReDoS / long-running matches.

Key ideas

  • Try to avoid running arbitrary backtracking regexes with untrusted input. Prefer a non-backtracking engine (RE2) or sandbox/timebox the evaluation. Use static detectors as a pre-check. (npmjs.com)

Options (ranked by safety / simplicity)

  1. Use RE2 (recommended)
  • Replace RegExp with node-re2 (bindings to Google RE2). RE2 does not use backtracking so vulnerable patterns can't cause exponential-time matches. Most drop-in compatible. (npmjs.com)

Example (TypeScript):
import RE2 from 're2';
const re = new RE2(userPattern, flags);
const ok = re.test(input);

  1. Pre-validate patterns with a detector
  • Run safe-regex / redos-detector / vuln-regex-detector to flag potentially vulnerable patterns before accepting them. These are heuristics; redos-detector is more thorough than safe-regex. Use this as a gate or to require user to choose RE2. (npmjs.com)
  1. Timebox execution (worker/child + timeout)
  • Execute RegExp.test() inside an isolated worker thread or child process and terminate it if it exceeds a short timeout. This prevents main thread freeze and bounds CPU/time. Use worker_threads. Terminate the worker if timeout elapses and treat the match as "timed out/failed". (Also limit input length.) (nodejs.org)

Minimal pattern (TypeScript, sketch):
// main thread
const worker = new Worker('./regex-worker.js', { workerData: { pattern, flags, input } });
const timer = setTimeout(() => { worker.terminate(); resolve({ timedOut: true }); }, 200);
worker.on('message', r => { clearTimeout(timer); resolve(r); });

  1. Use existing timeout-capable libs
  • Libraries such as super-regex / @redlotus/super-regex implement timeouts for Node.js regex execution (wraps execution with a timeout strategy). Consider them if you prefer a packaged solution. (npmjs.com)

Practical mitigations to combine

  • Prefer RE2 for untrusted patterns. If you must accept JavaScript RegExp syntax features RE2 lacks (backrefs/lookahead), require stricter validation or run in a sandboxed worker with a short timeout. (npmjs.com)
  • Limit input length and rate-limit requests.
  • Log and reject patterns flagged by detectors.
  • Use typed wrappers in TypeScript so callers can't accidentally call raw RegExp on user input.

References

  • node-re2 (RE2 bindings) — safe alternative to backtracking RegExp. (npmjs.com)
  • safe-regex — heuristic detector (fast, has false positives/negatives). (npmjs.com)
  • redos-detector / vuln-regex-detector — more complete ReDoS analysis tools. (npmjs.com)
  • Timeboxing with worker threads / timers (Node.js docs). (nodejs.org)
  • super-regex / similar libraries that provide timeouts. (npmjs.com)

If you want, I can:

  • produce a ready-to-run TypeScript example for any option above (RE2 + guard, or worker-thread timeout).

Citations:


ReDoS regex pattern examples Node.js catastrophic backtracking short patterns


<web_search_result>
Here are short, common ReDoS (catastrophic backtracking) regex examples that hit Node.js (V8) backtracking engines, why they’re bad, a simple attack string, and quick fixes.

Vulnerable short patterns (examples)

  • /(a+)+$/
    Why: nested quantifiers cause exponential splits of the same chars.
    Attack string: "aaaaaaaaaaaaaaaaaaaaaab" — engine backtracks massively. [1][3]

  • /^(a|aa)+$/
    Why: overlapping alternation creates many grouping partitions.
    Attack string: "aaaaaaaaaaaaaaab". [1][2]

  • /(.a){10}/ or /(.)$/
    Why: repeated .
    (dot-star) or nested .* can overreach and fight over characters.
    Attack string: long sequences of the same char not meeting the final condition. [1][5]

  • /^(\d+\w*)@/ or /^\s+|\s+$/ used badly on pathological inputs
    Why: contiguous quantified tokens or subtle overlap between tokens (\d vs \w, etc.) force many backtracking attempts. [1][2]

Why these are dangerous

  • Traditional JS regex engines use backtracking (NFA-style). Nested/overlapping greedy quantifiers produce exponentially many alternative parses for the same substring, exhausting CPU. Real outages have resulted from similar patterns (Cloudflare, StackOverflow incidents). [1][2][4]

Quick fixes / mitigations

  • Avoid nested/overlapping quantifiers: rewrite patterns so quantifiers don’t apply to subpatterns that can match the same text. Example: instead of /^(a+)+$/, use /^a+$/ if intent is “one or more a’s”. [1][3]
  • Make token boundaries explicit (use character classes, anchors, or separators) so groups can’t steal characters from each other. [1][5]
  • Use unambiguous alternation (prefer fixed-length alternatives or order them from longest to shortest when possible). [1]
  • Replace risky JS RegExp with a linear-time engine (Google RE2). In Node.js use the node-re2 module (bindings) to avoid backtracking entirely. [4]
  • Input length limits / timeouts: validate and bound user-controlled input before running expensive regexes. [2]

Short safe examples / rewrites

  • Vulnerable: /(a+)+$/ → Safe: /^a+$/ (if goal is “one or more a”). [1][3]
  • Vulnerable: /^(.)$/ → Safe: /^.$/ or use a more precise pattern that doesn’t nest . . [1][5]
  • Vulnerable: /(\d+\w*)@/ on untrusted input → parse with stricter token rules: /^[0-9]+[A-Za-z0-9_-]*@/ or validate pieces with separate parsing/length checks. [1][2]

References

  • Explosive quantifiers / catastrophic backtracking discussion and examples. [1]
  • OWASP / ReDoS examples and mitigation guidance. [2]
  • StackOverflow explanation of (a+)* / catastrophic backtracking. [3]
  • RE2 (no-backtracking engine) — use via node-re2 to avoid ReDoS. [4]
  • Practical writeup and visualizations of backtracking behavior. [5]

If you want, I can:

  • analyze a specific regex you have and show whether it’s vulnerable and how to rewrite it, or
  • give a small Node.js demo (safe vs vulnerable) you can run locally.
    </web_search_result>

Based on my research, I can now provide a comprehensive rewritten review comment. The original comment identifies a real vulnerability but proposes an incomplete fix.


Add comprehensive ReDoS protection; simple length checks are insufficient.

The code has a genuine ReDoS vulnerability. User-supplied patterns like /(a+)+$/ or /^(a|aa)+$/ can trigger catastrophic backtracking even with short inputs (< 50 characters), making a length limit of 1000 characters a false security measure. The try-catch-escape fallback also doesn't prevent ReDoS in the main regex branch.

Recommended mitigations (in order of robustness):

  1. Use RE2 library (preferred): Replace RegExp with node-re2 for linear-time, backtracking-free matching:

    import RE2 from 're2';
    const regex = new RE2(pattern, flags);
  2. Validate patterns + add timeout: Combine pattern validation with execution timeout:

    • Use safe-regex, redos-detector, or vuln-regex-detector to flag vulnerable patterns before compilation
    • Wrap regex execution in a timeout (e.g., using worker threads or super-regex library) to prevent event loop blocking
  3. Input length limit is a minor defense only; do not rely on it alone.

Consider refactoring to avoid accepting arbitrary user-supplied regexes, or sandbox their execution in worker threads with a short timeout (e.g., 200ms).

🧰 Tools
🪛 ast-grep (0.40.0)

[warning] 324-324: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(pattern, flags)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 328-328: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(escaped, flags)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents
In src/search-manager.ts around lines 319 to 330, the current creation of a
RegExp from user-supplied patterns is vulnerable to ReDoS and the
try/catch-escape fallback is insufficient; fix by replacing native RegExp with a
safe engine (preferably node-re2) or otherwise validating patterns and
sandboxing execution: import and use RE2 to construct regexes (or run a
vuln-regex detector like safe-regex/vidos-detector and reject unsafe patterns),
and execute matches in a short-timeout sandbox (worker thread or separate
process) to abort long-running matches; do not rely solely on input length
limits and ensure errors from invalid/unsafe patterns are rejected early with
clear error messages.

Comment on lines +336 to +350
if (filePattern) {
const patterns = filePattern.split('|').map(p => p.trim()).filter(Boolean);
excelFiles = excelFiles.filter(filePath => {
const fileName = path.basename(filePath);
return patterns.some(pat => {
// Support glob-like patterns
if (pat.includes('*')) {
const regexPat = pat.replace(/\./g, '\\.').replace(/\*/g, '.*');
return new RegExp(`^${regexPat}$`, 'i').test(fileName);
}
// Exact match (case-insensitive)
return fileName.toLowerCase() === pat.toLowerCase();
});
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

FilePattern regex also needs ReDoS protection.

The glob-to-regex conversion at line 343-344 could also be exploited:

const regexPat = pat.replace(/\./g, '\\.').replace(/\*/g, '.*');
return new RegExp(`^${regexPat}$`, 'i').test(fileName);

Since filePattern comes from user input, a pattern like *.*.*.*.*.*.*.*.*.* could create a regex with catastrophic backtracking when tested against certain filenames. Consider using a glob matching library like minimatch or picomatch instead of manual regex conversion.

🧰 Tools
🪛 ast-grep (0.40.0)

[warning] 343-343: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${regexPat}$, 'i')
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents
In src/search-manager.ts around lines 336 to 350, the current manual
glob-to-regex conversion (pat.replace(/\./g,...).replace(/\*/g,...)) can create
user-controlled regexes vulnerable to ReDoS; replace this manual conversion with
a safe glob-matching library (e.g., minimatch or picomatch) to evaluate fileName
against each pattern, and validate/sanitize the filePattern input (limit overall
length and number of '|' segments) before splitting so excessively long or
repeated wildcards are rejected; ensure you remove the RegExp
construction/testing and instead call the chosen library's match function with
case-insensitive options.

Comment on lines +451 to +457
private parseRange(range: string): [string, string | null] {
if (range.includes('!')) {
const [sheetName, cellRange] = range.split('!');
return [sheetName, cellRange];
}
return [range, null];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Edge case: sheet names containing ! character.

parseRange uses split('!') which could misparse if a sheet name contains an exclamation mark (e.g., "Hello!World!A1:B2"). While uncommon, Excel allows ! in sheet names when quoted.

 private parseRange(range: string): [string, string | null] {
-    if (range.includes('!')) {
-        const [sheetName, cellRange] = range.split('!');
-        return [sheetName, cellRange];
+    // Find the last '!' to handle sheet names that may contain '!'
+    const lastBangIndex = range.lastIndexOf('!');
+    if (lastBangIndex > 0) {
+        const sheetName = range.substring(0, lastBangIndex);
+        const cellRange = range.substring(lastBangIndex + 1);
+        return [sheetName, cellRange || null];
     }
     return [range, null];
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private parseRange(range: string): [string, string | null] {
if (range.includes('!')) {
const [sheetName, cellRange] = range.split('!');
return [sheetName, cellRange];
}
return [range, null];
}
private parseRange(range: string): [string, string | null] {
// Find the last '!' to handle sheet names that may contain '!'
const lastBangIndex = range.lastIndexOf('!');
if (lastBangIndex > 0) {
const sheetName = range.substring(0, lastBangIndex);
const cellRange = range.substring(lastBangIndex + 1);
return [sheetName, cellRange || null];
}
return [range, null];
}
🤖 Prompt for AI Agents
In src/utils/files/excel.ts around lines 451 to 457, parseRange currently uses
range.split('!') which breaks when a sheet name contains '!' (e.g., quoted names
like "Hello!World!A1:B2"); change the logic to find the last '!' (use
lastIndexOf('!')) and split into sheetName = range.slice(0, idx) and cellRange =
range.slice(idx+1), returning [sheetName, cellRange] if idx >= 0, otherwise
[range, null]; this preserves sheet names containing '!' and maintains existing
behavior when no range is present.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/search-manager.ts (2)

329-334: ReDoS vulnerability remains unaddressed.

This regex construction from user input is vulnerable to catastrophic backtracking, as comprehensively detailed in the previous review comment at lines 324-335. The recommended mitigations (RE2 library, pattern validation, or timeout sandboxing) have not been implemented.


348-349: Glob-to-regex ReDoS vulnerability remains unaddressed.

This manual glob-to-regex conversion creates user-controlled regexes vulnerable to ReDoS, as detailed in the previous review comment at lines 341-355. The recommendation to use a safe glob-matching library (e.g., minimatch or picomatch) has not been implemented.

🧹 Nitpick comments (6)
src/search-manager.ts (1)

313-315: Consider prioritizing the refactoring TODO.

Extracting Excel search logic to a separate module (e.g., src/utils/search/excel-search.ts) would improve maintainability and make it easier to add support for other file types (PDF, etc.) without bloating this file.

test/test-file-handlers.js (3)

88-116: Consider adding a test case for binary file fallback.

The test covers Excel, Text, and Image handlers well. Per the comment on lines 91-93, BinaryFileHandler is the fallback when binary detection fails at read time. Consider adding a test case for a known binary extension (e.g., .exe, .bin) to verify the handler selection behavior, even if it may map to TextFileHandler initially.


222-227: Status message stripping regex may be fragile.

The regex /^\[.*?\]\n\n/ assumes a specific status message format. If the format changes, this test will fail. Consider using readFileInternal (which excludes status messages) for cleaner testing, or document the expected format dependency.


244-250: Type inconsistency check may mask a bug.

Line 250 checks info.isFile === true || info.isFile === 'true', suggesting uncertainty about the return type. The getFileInfo function (see filesystem.ts lines 1130-1139) returns booleans from fs.stat(). If isFile is ever a string, that indicates a bug. Consider removing the string check and investigating if this occurs.

-  assert.ok(info.isFile === true || info.isFile === 'true', 'Should be a file');
+  assert.strictEqual(info.isFile, true, 'Should be a file');
src/tools/filesystem.ts (2)

876-884: Consider extracting content conversion to a helper function.

The content conversion logic (string vs Buffer, with special handling for images) is duplicated from readFileFromDisk (lines 698-706). Extracting this to a shared helper would reduce duplication and ensure consistent behavior.

function convertFileResultContent(result: FileResult): string {
    if (typeof result.content === 'string') {
        return result.content;
    } else if (result.metadata?.isImage) {
        return result.content.toString('base64');
    } else {
        return result.content.toString('utf8');
    }
}

674-678: Remove debug logging statement.

Line 675 contains what appears to be debug logging (console.error('error catch ' + error)) that should be removed or converted to structured telemetry consistent with the rest of the file.

     } catch (error) {
-        console.error('error catch ' + error);
         const errorMessage = error instanceof Error ? error.message : String(error);
         capture('server_read_file_error', {error: errorMessage, fileExtension: fileExtension});
         // If we can't stat the file, continue anyway and let the read operation handle errors
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4354153 and 3afa03c.

📒 Files selected for processing (4)
  • src/handlers/filesystem-handlers.ts (3 hunks)
  • src/search-manager.ts (5 hunks)
  • src/tools/filesystem.ts (13 hunks)
  • test/test-file-handlers.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/handlers/filesystem-handlers.ts
🧰 Additional context used
🧬 Code graph analysis (2)
src/search-manager.ts (1)
src/utils/files/factory.ts (1)
  • isExcelFile (68-71)
src/tools/filesystem.ts (3)
src/utils/files/base.ts (2)
  • ReadOptions (73-91)
  • FileResult (96-105)
src/utils/files/factory.ts (1)
  • getFileHandler (49-61)
src/utils/files/index.ts (1)
  • getFileHandler (10-10)
🪛 ast-grep (0.40.0)
src/search-manager.ts

[warning] 329-329: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(pattern, flags)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 333-333: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(escaped, flags)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 348-348: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${regexPat}$, 'i')
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (22)
src/search-manager.ts (7)

3-3: LGTM: Imports added for Excel search functionality.

The fs/promises import supports async file system operations in findExcelFiles, and isExcelFile provides extension-based filtering. Both are used appropriately.

Also applies to: 7-7


29-29: LGTM: New field tracks incomplete search status.

The wasIncomplete flag appropriately captures ripgrep exit code 2 (files couldn't be searched due to permissions/access), improving observability for partial results.


149-172: LGTM: Excel search integrated appropriately as background operation.

The conditional Excel search runs asynchronously without blocking the initial response, merging results into the session as they arrive. Error handling correctly logs failures without breaking the main search flow. Node.js single-threaded event loop ensures safe concurrent access to session.results.


433-470: LGTM: Recursive Excel file discovery handles edge cases appropriately.

The method correctly handles both file and directory paths, filters hidden directories and node_modules, and swallows access errors to maintain robustness during search operations.


475-487: LGTM: Context extraction is straightforward and correct.

Boundary checks with Math.max/Math.min prevent out-of-bounds access, and ellipsis indicators are added appropriately for truncated text.


537-561: LGTM: Excel search eligibility logic is sound.

The method correctly identifies when Excel search should run by checking both the rootPath extension and filePattern for Excel-specific patterns. Case-insensitive matching is handled consistently.


174-186: LGTM: Comments clarify async Excel search behavior.

The updated comments explain that Excel search runs in the background and results merge asynchronously, which is helpful for understanding the implementation.

test/test-file-handlers.js (7)

35-43: LGTM! Error handling in cleanup is appropriate.

The cleanup function correctly suppresses ENOENT errors (file not found) while still logging other unexpected errors. This is the right approach for idempotent cleanup operations.


45-59: LGTM! Setup correctly isolates test environment.

The setup function properly saves the original config before modifying allowedDirectories, enabling clean restoration in teardown.


121-141: LGTM! Interface contract validation is sound.

The test properly validates the FileResult interface structure, checking required fields and optional metadata typing.


146-163: LGTM! ReadOptions behavior is tested adequately.

The test validates offset and length options work correctly. The assertions verify expected lines are present after applying the options.


168-188: LGTM! canHandle method validation is correct.

The test validates that handler instances correctly identify files they can handle via their canHandle() method.


258-279: LGTM! Write mode tests are comprehensive.

The test correctly validates both rewrite (default) and append modes, checking content replacement and content preservation respectively.


299-325: LGTM! Test runner structure is well-designed.

The exported runTests() function properly handles setup, execution, and teardown with appropriate error handling. The direct execution guard correctly uses exit codes for CI/CD compatibility.

src/tools/filesystem.ts (8)

11-12: LGTM! Handler imports are correctly structured.

The imports from the new file handler module are appropriately typed and provide the factory function and type definitions needed for the refactored API.


318-329: LGTM! Return structure aligns with FileResult interface.

The updated return format with metadata: { isImage } correctly conforms to the FileResult interface defined in base.ts, ensuring consistency across all file reading operations.


483-486: LGTM! Metadata structure is consistent across read helpers.

All internal text-reading functions correctly return metadata: { isImage: false }, maintaining consistency with the FileResult interface while accurately representing the content type.


737-745: LGTM! ReadOptions forwarding is correct.

The updated readFile signature properly extracts and forwards the relevant options to readFileFromDisk, including the new Excel-specific sheet and range parameters.


854-859: LGTM! Write delegation to file handler is clean.

The writeFile function properly delegates to the appropriate handler while maintaining telemetry capture. The mode parameter is correctly forwarded.


1102-1126: LGTM! Graceful fallback pattern for handler errors.

The try-catch around handler.getInfo() with fallback to fs.stat() values ensures robustness. This prevents handler-specific errors from breaking basic file info retrieval.


1141-1165: LGTM! Metadata mapping maintains backward compatibility.

The legacy format mapping correctly extracts type-specific metadata (lineCount for text, sheets for Excel, isImage for images, isBinary for binary files) while maintaining the expected API shape for existing consumers.


698-706: I encountered a repository access issue. However, I can provide guidance based on the code snippet you've provided. Let me request manual verification of the critical piece of information:

To properly verify this concern, I need you to check the ImageFileHandler.read() method implementation and confirm:

  1. What does ImageFileHandler.read() actually return? Is it:

    • A base64-encoded string, or
    • A Buffer object?
  2. What is the type annotation (if any) for the return type of read()?

Could you locate the ImageFileHandler class definition in your repository and share the implementation of its read() method? This will definitively answer whether the re-encoding at lines 701-703 represents double-encoding or is necessary due to the handler returning a Buffer.

Alternatively, if you can access the code, the file is likely in a handlers or file-handlers related directory, possibly named something like image-file-handler.ts or handlers/image-handler.ts.

@codeant-ai
Copy link
Contributor

codeant-ai bot commented Nov 28, 2025

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai bot added size:XXL This PR changes 1000+ lines, ignoring generated files and removed size:XXL This PR changes 1000+ lines, ignoring generated files labels Nov 28, 2025
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Nov 28, 2025

CodeAnt AI Incremental review completed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/utils/files/factory.ts (1)

80-106: Consider centralizing extension checks to avoid divergence

isExcelFile/isImageFile duplicate the extension logic that likely also lives in ExcelFileHandler.canHandle and ImageFileHandler.canHandle. To avoid future drift, consider sharing a small extension helper or reusing the handler methods here instead of re-encoding the suffix lists.

src/utils/files/text.ts (1)

27-39: Future: deduplicate size/threshold constants with filesystem module

The local FILE_SIZE_LIMITS and READ_PERFORMANCE_THRESHOLDS mirror concepts already present in filesystem.ts (per the TODO). Once the reorganization happens, consider centralizing these constants to avoid silent drift between read strategies in different layers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3afa03c and 974be40.

📒 Files selected for processing (6)
  • src/tools/edit.ts (2 hunks)
  • src/tools/filesystem.ts (11 hunks)
  • src/utils/files/base.ts (1 hunks)
  • src/utils/files/binary.ts (1 hunks)
  • src/utils/files/factory.ts (1 hunks)
  • src/utils/files/text.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/tools/edit.ts (3)
src/handlers/edit-search-handlers.ts (1)
  • handleEditBlock (13-13)
src/tools/schemas.ts (1)
  • EditBlockArgsSchema (88-102)
src/utils/files/factory.ts (1)
  • getFileHandler (60-78)
src/utils/files/text.ts (1)
src/utils/files/base.ts (3)
  • ReadOptions (73-91)
  • FileResult (96-105)
  • FileInfo (173-200)
src/utils/files/binary.ts (2)
src/utils/files/index.ts (1)
  • BinaryFileHandler (15-15)
src/utils/files/base.ts (4)
  • FileHandler (13-64)
  • ReadOptions (73-91)
  • FileResult (96-105)
  • FileInfo (173-200)
🔇 Additional comments (7)
src/utils/files/factory.ts (1)

15-78: Factory ordering and lazy singletons look correct

Priority (Excel → Image → Binary → Text) combined with lazy-instantiated handler singletons is clear and safe; getFileHandler always returns some handler and does binary detection last to avoid unnecessary isBinaryFile checks for known extensions.

src/utils/files/text.ts (1)

45-440: TextFileHandler read/write/info implementation looks solid

The handler cleanly centralizes text-file behavior (size-aware read strategies, append vs rewrite, metadata with optional line counts). External reads intentionally normalize line endings via the readline-based helpers, while internal edit flows still use readFileInternal to preserve exact endings, which keeps editing semantics safe while optimizing user-facing reads.

src/utils/files/binary.ts (1)

22-78: Binary handler behavior and instruction messaging are safe and consistent

Using isBinaryFile for canHandle and returning human-readable instructions (with only path.basename(filePath) exposed) is a good pattern: it avoids leaking full paths into messages and removes the previous risk of interpolating raw paths into shell/REPL examples. The explicit error in write() keeps binary writes from silently doing the wrong thing.

src/utils/files/base.ts (1)

13-200: File handling interfaces are well-structured and match handler usage

The FileHandler/ReadOptions/FileResult/FileInfo surface provides a clear, extensible contract that matches how the text, Excel, image, and binary handlers are using it. The optional editRange and FileMetadata fields give enough room for type-specific extensions without complicating the common path.

src/tools/filesystem.ts (3)

296-384: Handler-based readFileFromDisk wiring and content normalization look correct

readFileFromDisk now cleanly delegates to getFileHandler().read, preserves path validation, and normalizes the FileResult content into a string (including base64 for images) while passing through mimeType and metadata. This matches how readMultipleFiles and URL reads treat images and text, and keeps the higher layers insulated from per-type details.


450-473: writeFile delegation to handlers is a good abstraction boundary

Validating the path once, capturing telemetry, then routing through getFileHandler(validPath).write allows Excel/text/image/binary handlers to own their persistence semantics (including BinaryFileHandler’s explicit rejection), without duplicating logic here.


713-781: getFileInfo merges handler metadata with fs.stat sensibly

The pattern of computing a fallbackInfo from fs.stat, then letting the handler override fields via getInfo and projecting into the legacy Record<string, any> shape (with sheet, lineCount, isImage/isBinary flags) is backward compatible and resilient to handler failures.

…ection

  - Remove execute_node tool in favor of start_process("node:local")
  - Add pythonInfo to system info for LLM to check Python availability
  - Add node:local virtual session handling in improved-process-tools.ts
  - Add validatePath to edit.ts range+content branch
  - Fix parameter guard (range !== undefined instead of truthy)
  - Clean up: Remove node-handlers.ts and ExecuteNodeArgsSchema
  - Simplify tool descriptions, document node:local fallback

  The node:local approach is cleaner - LLM uses existing start_process/
  interact_with_process flow instead of a separate tool. This is the final fallback when users dont have node and dont have python, we run the code on server, that's the idea.
  Python detection lets LLM decide when to fall back to Node.js execution.
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Nov 29, 2025

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai bot added size:XXL This PR changes 1000+ lines, ignoring generated files and removed size:XXL This PR changes 1000+ lines, ignoring generated files labels Nov 29, 2025
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Nov 29, 2025

CodeAnt AI Incremental review completed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
src/utils/system-info.ts (1)

47-51: Leverage pythonInfo in development guidance for clearer tool hints.

You now detect Python and expose it via systemInfo.pythonInfo, but getDevelopmentToolGuidance still shows generic Python text. You could improve usability by surfacing the actual detection result (command + version or “not detected”), similar to nodeGuidance, e.g.:

-export function getDevelopmentToolGuidance(systemInfo: SystemInfo): string {
-    const { isWindows, isMacOS, isLinux, platformName, nodeInfo, processInfo } = systemInfo;
+export function getDevelopmentToolGuidance(systemInfo: SystemInfo): string {
+    const { isWindows, isMacOS, isLinux, platformName, nodeInfo, pythonInfo, processInfo } = systemInfo;
@@
-    const nodeGuidance = nodeInfo 
+    const nodeGuidance = nodeInfo 
         ? `Node.js: v${nodeInfo.version} (${nodeInfo.path})${nodeInfo.npmVersion ? ` | npm: v${nodeInfo.npmVersion}` : ''}`
         : 'Node.js: Not detected';
+
+    const pythonGuidance = pythonInfo?.available
+        ? `Python: v${pythonInfo.version ?? 'unknown'} (command: ${pythonInfo.command})`
+        : 'Python: Not detected';
@@
-        return `
+        return `
 COMMON WINDOWS DEVELOPMENT TOOLS:
- - ${nodeGuidance}
- - Python: May be 'python' or 'py' command, check both
+ - ${nodeGuidance}
+ - ${pythonGuidance}
@@
-        return `
+        return `
 COMMON MACOS DEVELOPMENT TOOLS:
@@
-- ${nodeGuidance}
-- Python: Usually python3, check if python points to Python 2
+- ${nodeGuidance}
+- ${pythonGuidance}
@@
-        return `
+        return `
 COMMON LINUX DEVELOPMENT TOOLS:
@@
-- Python: Usually python3, python may point to Python 2
-- ${nodeGuidance}
+- ${pythonGuidance}
+- ${nodeGuidance}

This makes the guidance reflect the actual environment instead of generic guesses, while still keeping the OS-specific notes about command naming if you want to retain them.

Also applies to: 721-768

src/tools/schemas.ts (1)

83-102: Schema refine logic is correct, but consider stricter typing for content.

The refine validation correctly enforces mutual exclusivity between the two editing modes. However, z.any() for content is very permissive.

Consider constraining content to expected types for better validation:

-  content: z.any().optional(),
+  content: z.union([z.array(z.array(z.any())), z.record(z.array(z.array(z.any())))]).optional(),

This would enforce 2D arrays or sheet-keyed objects, matching the documented Excel formats.

src/utils/files/text.ts (3)

86-94: Double file read inefficiency in getInfo.

For files under 10MB, the entire file is read just to count lines. If read() is called afterward, the file content is read again. Consider caching the content or deferring line counting to when it's actually needed.

Additionally, the empty catch block silently swallows errors which can make debugging difficult.

         try {
             const content = await fs.readFile(path, 'utf8');
             const lineCount = TextFileHandler.countLines(content);
             info.metadata!.lineCount = lineCount;
         } catch (error) {
-            // If reading fails, skip line count
+            // If reading fails (e.g., encoding issues), skip line count
+            // This is intentional - metadata is best-effort
         }

199-234: File may be read multiple times in readFileWithSmartPositioning.

getFileLineCount() (Line 209) reads the entire file for files under 10MB, then the actual read operation reads it again. For small-to-medium files, this doubles I/O.

Consider:

  1. Reading content once and passing it to both line counting and content extraction, or
  2. Making line counting lazy/optional via a flag in ReadOptions.

371-441: Potential accuracy issue with byte-position estimation.

The estimation assumes consistent line lengths based on a small sample (10KB). Files with highly variable line lengths (e.g., log files, mixed content) may land far from the intended offset.

The first-line skip logic (Lines 419-422) handles partial line reads correctly, but the actual line offset shown in the status message may not match reality for files with variable line lengths.

Consider adding a note to the status message when using estimation:

             const content = includeStatusMessage
-                ? `${this.generateEnhancedStatusMessage(result.length, offset, fileTotalLines, false)}\n\n${result.join('\n')}`
+                ? `${this.generateEnhancedStatusMessage(result.length, offset, fileTotalLines, false)} (estimated position)\n\n${result.join('\n')}`
                 : result.join('\n');
src/tools/improved-process-tools.ts (1)

19-22: Clarify virtual Node session lifecycle and timeout semantics for node:local

The virtual session wiring works, but a few aspects could be tightened up:

  • Entries in virtualNodeSessions are added in startProcess and never removed; over time this can leak map entries and leave “dead” PIDs that still look valid.
  • forceTerminate, readProcessOutput, and listSessions are unaware of virtual PIDs, so callers see No session found for PID ... even though they were told a session exists, and there’s no way to “terminate” a node:local session besides ignoring it.
  • In interactWithProcess, the per-call timeout_ms argument is ignored for virtual sessions; only the timeout captured at startProcess is honored. That’s a surprising divergence from the rest of the process API.

Consider:

  1. Letting forceTerminate clean up virtual sessions and return a clearer message, e.g.:
 export async function forceTerminate(args: unknown): Promise<ServerResult> {
   const parsed = ForceTerminateArgsSchema.safeParse(args);
   if (!parsed.success) {
@@
-  const success = terminalManager.forceTerminate(parsed.data.pid);
+  const pid = parsed.data.pid;
+
+  if (virtualNodeSessions.has(pid)) {
+    virtualNodeSessions.delete(pid);
+    return {
+      content: [{
+        type: "text",
+        text: `Cleared virtual Node.js session ${pid}`
+      }],
+    };
+  }
+
+  const success = terminalManager.forceTerminate(pid);
  1. Making interactWithProcess respect a per-call timeout override while still defaulting to the session timeout:
-  if (virtualNodeSessions.has(pid)) {
-    const session = virtualNodeSessions.get(pid)!;
+  if (virtualNodeSessions.has(pid)) {
+    const session = virtualNodeSessions.get(pid)!;
@@
-    // Execute code via temp file approach
-    return executeNodeCode(input, session.timeout_ms);
+    // Execute code via temp file approach
+    const effectiveTimeout = timeout_ms ?? session.timeout_ms;
+    return executeNodeCode(input, effectiveTimeout);
   }
  1. (Optional) Teaching readProcessOutput/listSessions to special‑case negative PIDs as virtual Node sessions, or explicitly documenting in the node:local startup message that these tools don’t apply to such sessions.

These tweaks would make the virtual session behavior more predictable and avoid silent accumulation of unused session entries.

Also applies to: 127-151, 529-539

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 974be40 and cbf76dd.

📒 Files selected for processing (7)
  • src/server.ts (8 hunks)
  • src/tools/edit.ts (2 hunks)
  • src/tools/improved-process-tools.ts (4 hunks)
  • src/tools/schemas.ts (3 hunks)
  • src/utils/files/factory.ts (1 hunks)
  • src/utils/files/text.ts (1 hunks)
  • src/utils/system-info.ts (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/utils/files/factory.ts
🧰 Additional context used
🧬 Code graph analysis (2)
src/tools/edit.ts (4)
src/handlers/edit-search-handlers.ts (1)
  • handleEditBlock (13-13)
src/tools/schemas.ts (1)
  • EditBlockArgsSchema (88-102)
src/tools/filesystem.ts (1)
  • validatePath (173-228)
src/utils/files/factory.ts (1)
  • getFileHandler (60-78)
src/server.ts (1)
src/tools/schemas.ts (1)
  • EditBlockArgsSchema (88-102)
🔇 Additional comments (14)
src/utils/system-info.ts (1)

4-4: Based on my verification through web search, the review comment's technical claims are sound and accurate:

  1. Python --version output: Confirmed that while Python 3.4+ prints to stdout, older Python versions (2.x and early 3.x) historically printed to stderr. The review's concern about robustness is valid.

  2. execSync behavior: Confirmed that execSync returns stdout only by default, and using 2>&1 in the command string correctly redirects stderr to stdout in the shell.

  3. The proposed fix: Using 2>&1 is the correct approach for Node.js execSync to capture both streams.

  4. Caching optimization: Valid performance improvement to avoid repeated subprocess calls.

The review is technically correct and provides sound engineering recommendations. Now I'll generate the rewritten review comment:


Python detection may miss valid installations due to stderr output in older Python versions; use 2>&1 redirection and consider caching results.

While Python 3.4+ prints --version to stdout, older Python versions (2.x and early 3.x builds) print to stderr. The current execSync call only captures stdout, so version becomes an empty string on systems with older Python, incorrectly reporting available: false. Since getSystemInfo() may be called repeatedly, re-running up to 2–3 sync subprocesses each time is also inefficient.

Two concrete suggestions:

  1. Fix version capture to include stderr (important for robustness):
-            const version = execSync(`${cmd} --version`, {
+            const version = execSync(`${cmd} --version 2>&1`, {
                 encoding: 'utf8',
                 timeout: 5000,
                 stdio: ['pipe', 'pipe', 'pipe']
             }).trim();
  1. Optionally cache detection to avoid repeated sync subprocesses (performance improvement):
+let cachedPythonInfo: SystemInfo['pythonInfo'] | undefined;
+
 function detectPythonInfo(): SystemInfo['pythonInfo'] {
+    if (cachedPythonInfo) {
+        return cachedPythonInfo;
+    }
+
     // Try python commands in order of preference
     const pythonCommands = process.platform === 'win32'
         ? ['python', 'python3', 'py']
         : ['python3', 'python'];
 
     for (const cmd of pythonCommands) {
         try {
-            if (version.includes('Python 3')) {
-                return {
+            if (version.includes('Python 3')) {
+                cachedPythonInfo = {
                     available: true,
                     command: cmd,
                     version: version.replace('Python ', '')
-                };
+                };
+                return cachedPythonInfo;
             }
         } catch {}
     }
-    return { available: false, command: '' };
+    cachedPythonInfo = { available: false, command: '' };
+    return cachedPythonInfo;
 }

This keeps the public SystemInfo['pythonInfo'] shape unchanged while making detection more reliable across Python versions and more efficient when called repeatedly.

Likely an incorrect or invalid review comment.

src/tools/schemas.ts (1)

50-52: LGTM! Schema extensions for Excel support.

The new optional fields (sheet, range, options) appropriately extend the read file schema to support Excel-specific features while maintaining backward compatibility with text file operations.

src/tools/edit.ts (3)

1-16: Good documentation of technical debt.

The file-level comment clearly explains the architectural inconsistency between Excel and text editing paths, and outlines a future refactoring plan. This is helpful for maintainability.


374-420: Previous review concerns have been addressed.

The range-based edit path now:

  1. Uses parsed.range !== undefined (Line 374) instead of truthy check, fixing the empty-string edge case.
  2. Calls validatePath() (Line 377) before filesystem operations, addressing the security bypass concern.
  3. Uses validatedPath consistently for handler resolution and editRange calls.

422-432: Good defensive validation for text replacement path.

Explicitly checking for undefined on old_string/new_string before proceeding to performSearchReplace prevents runtime errors from undefined values reaching .length operations.

src/utils/files/text.ts (2)

27-40: Good identification of technical debt.

The TODO comment about centralizing constants with filesystem.ts is valuable. Consider creating a shared constants module during the planned reorganization to prevent silent drift between duplicated values.


239-283: Solid implementation of reverse file reading.

The readLastNLinesReverse method correctly:

  • Uses file descriptor for efficient random access
  • Processes chunks from end to start
  • Handles partial lines at chunk boundaries
  • Properly closes the file descriptor in the finally block
src/server.ts (4)

264-271: Clear format handling documentation for read_file.

The FORMAT HANDLING section provides good guidance:

  • Explicit sheet parameter types (string name or 0-based index)
  • Clear range format requirement (FROM:TO like "A1:D100")
  • Notes that offset/length serve as fallback row pagination

327-331: Excel write format documentation is helpful.

The example '[["Name","Age"],["Alice",30]]' clearly shows the expected JSON 2D array format. The multi-sheet format {"SheetName": [[...]]} is also documented.


593-607: Excel edit_block documentation is clear and consistent.

The range format matches read_file documentation ("SheetName!A1:C10"). The 2D array content example is consistent with write_file format.


694-702: Unable to verify ExcelJS availability—sandbox environment cannot access repository files.

The documentation's claim that "ExcelJS available for Excel files (NOT xlsx library)" cannot be verified because the repository cannot be accessed in the current sandbox environment. Manual verification of package.json is required to confirm ExcelJS is actually listed as a dependency and that the guidance accurately reflects the project's capabilities.

src/tools/improved-process-tools.ts (3)

9-18: Imports and MCP root resolution look good

Using fileURLToPath(import.meta.url) plus path.dirname/path.resolve to derive mcpRoot is a solid way to anchor temp execution to the MCP install directory for ES module imports. No issues from a correctness standpoint here.


171-175: Reusing commandToRun improves consistency

Switching the executeCommand call to use commandToRun rather than parsed.data.command keeps a single source of truth for the command string (and aligns with the new node:local path above). This looks correct and reduces future refactor friction.


27-91: Based on the Node.js official documentation search results, the original review comment contains a critical inaccuracy. The Node.js documentation for child_process.spawn() (v22.19.0) clearly states that the timeout option does apply to async spawn and will send the configured killSignal (default "SIGTERM") when exceeded.

The review claims:

"for asynchronous child_process.spawn this option is not honored; it's only meaningful for exec*/spawnSync"

However, the official Node.js documentation confirms that:

  • timeout option applies to all spawn calls (including async)
  • When timeout is exceeded, Node sends the configured killSignal
  • For async APIs, AbortSignal via options.signal is also available as an alternative

The timeout option in spawn() is actually enforced by Node.js; the code does not have the critical bug claimed.

The spawn() call with { timeout: timeout_ms } in executeNodeCode is correctly implemented. The timeout parameter is not ignored for async spawn in current Node.js versions. The proposed diff to add manual timer management, while not harmful, is addressing a non-existent issue based on outdated or incorrect assumptions about Node.js behavior.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/tools/improved-process-tools.ts (1)

521-527: Per‑session timeout for node:local is never used due to destructuring default.

In interactWithProcess, timeout_ms has a default (= 8000), so it’s never undefined. That means:

const effectiveTimeout = timeout_ms ?? session.timeout_ms;

will always pick the per‑call value and ignore the timeout_ms stored in virtualNodeSessions from startProcess. It also means the same 8s default is used for both shell sessions and node:local even if the user configured a different default when starting the Node session.

You can fix this by removing the default on timeout_ms, introducing a resolved timeout for the non‑Node path, and using the raw timeout_ms when computing effectiveTimeout, for example:

-  const {
-    pid,
-    input,
-    timeout_ms = 8000,
-    wait_for_prompt = true,
-    verbose_timing = false
-  } = parsed.data;
+  const {
+    pid,
+    input,
+    timeout_ms,
+    wait_for_prompt = true,
+    verbose_timing = false
+  } = parsed.data;
+
+  const resolvedTimeoutMs = timeout_ms ?? 8000;
@@
-  if (virtualNodeSessions.has(pid)) {
+  if (virtualNodeSessions.has(pid)) {
     const session = virtualNodeSessions.get(pid)!;
@@
-    const effectiveTimeout = timeout_ms ?? session.timeout_ms;
+    const effectiveTimeout = timeout_ms ?? session.timeout_ms;
     return executeNodeCode(input, effectiveTimeout);
   }
@@
-        const pollIntervalMs = 50; // Poll every 50ms for faster response
-        const maxAttempts = Math.ceil(timeout_ms / pollIntervalMs);
+        const pollIntervalMs = 50; // Poll every 50ms for faster response
+        const maxAttempts = Math.ceil(resolvedTimeoutMs / pollIntervalMs);
@@
-          if (attempts >= maxAttempts) {
+          if (attempts >= maxAttempts) {
@@
-      timeout = setTimeout(() => {
+      timeout = setTimeout(() => {
@@
-      }, timeout_ms);
+      }, resolvedTimeoutMs);

This preserves the 8s default for “normal” processes, while letting node:local honor the session‑level timeout unless the caller explicitly overrides it.

Also applies to: 529-541, 599-605, 661-665, 669-669

🧹 Nitpick comments (2)
src/tools/improved-process-tools.ts (2)

763-777: Clarify forceTerminate semantics for node:local sessions.

For node:local, forceTerminate only removes the virtual session entry and doesn’t actually kill any running child process (each interact_with_process run is one‑shot). That’s functionally fine, but it might be worth making the message explicit (e.g., “Cleared virtual Node.js session metadata; each script run is independent”) so users don’t assume it cancels an in‑flight Node execution.


793-809: Align listSessions / readProcessOutput UX for node:local sessions.

listSessions now shows node:local sessions with negative PIDs, but readProcessOutput doesn’t know about virtualNodeSessions, so calling read_process_output on one of these PIDs will currently respond with “No session found for PID …”. To avoid confusion, consider either:

  • teaching readProcessOutput to special‑case virtualNodeSessions and return a tailored message like “node:local sessions don’t keep persistent output; use interact_with_process to run a fresh script”, or
  • omitting node:local sessions from listSessions entirely if they’re not meant to be readable.

The first option is probably more user‑friendly and keeps the listing complete.

Also applies to: 813-816

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 640942d and f4f817c.

📒 Files selected for processing (1)
  • src/tools/improved-process-tools.ts (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/tools/improved-process-tools.ts (2)
src/types.ts (1)
  • ServerResult (73-77)
src/terminal-manager.ts (1)
  • terminalManager (422-422)
🔇 Additional comments (2)
src/tools/improved-process-tools.ts (2)

127-151: Node node:local session wiring and commandToRun usage look sound.

The node:local special case, virtual PID allocation, and reuse of commandToRun for non‑Node commands are coherent and keep existing terminalManager behavior intact while cleanly short‑circuiting into the new path. The guidance text is clear about stateless execution per interact call.

Also applies to: 170-175


9-22: I'll verify the review comment by examining the actual code and confirming the issues raised.
<function_calls>

First, let's look at the specific file and the executeNodeCode function

find . -name "improved-process-tools.ts" -type f


</function_calls>

Let me search for the file and examine its contents:
<function_calls>

Find the file

fd "improved-process-tools.ts" -type f


</function_calls>

<function_calls>

List available TypeScript files to locate the target

fd ".ts$" -type f | head -20


</function_calls>

<function_calls>

Search for the file in the entire repo

find . -type f -name "process-tools" 2>/dev/null


</function_calls>

<function_calls>

Let me check the overall repo structure

ls -la


</function_calls>

<function_calls>

Try listing all files in src/tools

ls -la src/tools/ 2>/dev/null || echo "src/tools does not exist"


</function_calls>

<function_calls>

Check if src directory exists

ls -la src/ 2>/dev/null || echo "src does not exist"


</function_calls>

Let me check what branch is currently checked out and verify the repository state:
<function_calls>

Check git status and current branch

git status
git branch


</function_calls>

<function_calls>

List repo contents to understand structure

git ls-files | head -30


</function_calls>

<function_calls>

Search for any TypeScript files mentioning "spawn" or "executeNodeCode"

rg "executeNodeCode|spawn.*timeout" -t ts --no-heading -n


</function_calls>

<function_calls>

Look for the file in git history or search more broadly

git log --oneline --all | head -5


</function_calls>

Let me check what files exist in the repository more comprehensively:
<function_calls>

Use git ls-files to get exact file listing

git ls-files | grep -i process


</function_calls>

<function_calls>

Let me just list what's in the repo root

pwd
find . -maxdepth 3 -type f -name ".ts" -o -name ".js" | head -30


</function_calls>

<function_calls>

Check git remote and current state

git remote -v
git log --oneline -5


</function_calls>

<function_calls>

See all files in repo

git ls-files 2>/dev/null | wc -l
git ls-files 2>/dev/null | head -50


</function_calls>

Let me take a different approach and search for any Node-related spawn usage:
<function_calls>

Search recursively for spawn usage anywhere

rg "spawn" --type ts --type js -n -C 3 2>/dev/null | head -100


</function_calls>

<function_calls>

Check if the file path in the review comment actually exists, accounting for various encodings/states

stat "src/tools/improved-process-tools.ts" 2>&1 ||
file "src/tools/improved-process-tools.ts" 2>&1 ||
echo "File does not exist at that path"


</function_calls>

Let me check if there's a PR context I should be aware of:
<function_calls>

Try to get PR information if available via gh CLI

gh pr view --json files 2>/dev/null | head -100 || echo "No PR context or gh CLI not configured"


</function_calls>

<function_calls>

Check git branches

git branch -a


</function_calls>

Based on my investigation, I cannot locate the file src/tools/improved-process-tools.ts in the current repository. The repository appears to not have this file, or it's not on the current branch. However, I can still verify the technical accuracy of the review comment's claims about child_process.spawn and the proposed fix.

Let me verify the Node.js spawn API to confirm the review's assertion:
<function_calls>


child_process.spawn timeout option Node.js API


</function_calls>


I need to gather information about the Node.js spawn API signature and timeout behavior to verify the review comment's technical claims:
<function_calls>


Node.js child_process spawn options timeout parameter documentation


</function_calls>

Once I receive the search results, I'll be able to verify whether the review comment's technical claims are accurate and provide a definitive assessment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant