Added ML Sklearn Prediction Node#715
Added ML Sklearn Prediction Node#715Tejeshyewale wants to merge 59 commits intorocketride-org:developfrom
Conversation
Fixed wording, numbering, and documentation clarity in README.
|
No description provided. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new ML sklearn prediction node: a PreProcessor that loads a pickled model and predicts from a single input value, an IGlobal lifecycle wrapper to create/release the preprocessor, an Instance that delegates processing, plus service manifest, requirements, and README. Changes
Sequence DiagramsequenceDiagram
participant System
participant IGlobal
participant PreProcessor
participant Instance
System->>IGlobal: beginGlobal()
IGlobal->>PreProcessor: instantiate (load model.pkl)
PreProcessor-->>IGlobal: ready (model set or None)
System->>Instance: process(text)
Instance->>IGlobal: request preprocessor
IGlobal-->>Instance: return preprocessor or None
alt preprocessor present
Instance->>PreProcessor: process(text)
PreProcessor-->>Instance: prediction (string) or original text on error
Instance-->>System: return result
else no preprocessor
Instance-->>System: return original text
end
System->>IGlobal: endGlobal()
IGlobal->>PreProcessor: release (set to None)
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/ml_sklearn/code.py`:
- Around line 23-26: Tighten numeric conversion by catching both ValueError and
TypeError when calling float(price) (the try/except around the price conversion)
and return a clear error including the exception message; also remove the broad
bare except at the later error-handling block (the generic except that currently
swallows all exceptions around the prediction/path) and either catch specific
exceptions you expect or propagate the exception (or return its message) so
actionable failure causes are not hidden. Ensure the two updated paths reference
the existing price conversion block and the existing generic except block so
behavior is explicit and debuggable.
- Around line 4-36: Update the Node class to follow repository Python style:
replace double-quoted strings with single quotes (e.g., 'model.pkl', 'Input must
be a dictionary', 'Missing \'price\'', 'Price must be a number', 'Prediction
failed'), add PEP 257 docstrings for the class and its methods (__init__ and
run) explaining purpose and parameters/returns, and ensure the code conforms to
ruff/formatting for Python 3.10+ (e.g., minimal exception handling should
capture the exception as e if logging is needed, keep type-safe conversions and
returns unchanged); key identifiers to update: class Node, __init__, run,
model_path, self.model, and prediction.
- Around line 7-9: The current use of pickle.load to populate self.model from
model_path is unsafe; replace it with a safe loading approach: either verify the
model file integrity (validate a bundled hash or signature) before loading and
use a restricted Unpickler that overrides find_class to whitelist allowed
classes, or convert the model to a safer format (e.g., joblib or ONNX) and call
the corresponding safe loader (e.g., joblib.load or ONNX runtime load) instead
of pickle.load; update the code that sets self.model and the logic around
model_path to perform integrity checks and use the restricted loader or new
format loader.
In `@nodes/src/nodes/ml_sklearn/IGlobal.py`:
- Around line 38-44: beginGlobal() imports PreProcessor from .code but code.py
defines Node, so the import will fail; update beginGlobal() to import the
correct symbol or adapt to the existing API: replace "from .code import
PreProcessor" with the actual exported class/function (e.g., Node) or add a
PreProcessor wrapper in code.py, then construct the preprocessor using the
correct constructor signature; ensure references inside beginGlobal() that call
PreProcessor(self.glb.logicalType, self.glb.connConfig, bag) are updated to
match the chosen symbol's parameters (or add a compat constructor) so
self.preprocessor is assigned without runtime import errors.
In `@nodes/src/nodes/ml_sklearn/Instance.py`:
- Line 49: The warning text in Instance.py referencing unsupported source-code
languages is incorrect for this sklearn prediction node; update the message in
the warning call that uses self.instance.currentObject.path (inside the Instance
class / relevant method) to reflect that the file could not be processed as a
scikit-learn model or supported model artifact (remove the repeated "typescript"
and remove language-specific wording), e.g. say the file does not appear to be a
valid sklearn model or supported model format and include the path via
self.instance.currentObject.path.
- Around line 63-79: The tableId handling is wrong: replace the no-op
self.tableId = self.tableId and ensure metadata.tableId is set on each document
before appending so emitted documents carry the correct table id. Specifically,
when isTable is True, set metadata.tableId = self.tableId on the document (not
after appending) then append document and finally increment self.tableId; when
not isTable, set metadata.tableId = 0 on the document before appending. Update
the logic around documents.append(document), metadata.tableId, and self.tableId
to follow this order (use the existing symbols self.tableId, metadata.tableId,
document, documents, isTable).
- Line 83: Rename the parameter named `object` in the Instance.open method to
`obj` to avoid shadowing the built-in and follow the repo lifecycle bridge
convention; update the method signature `def open(self, obj: Entry):` and
replace all references to `object` inside the `open` method body (and any
overrides/call sites that pass a keyword named `object`) to use `obj` instead,
keeping the same type annotation `Entry`.
In `@nodes/src/nodes/ml_sklearn/README.md`:
- Around line 5-11: Add required blank lines around the level-2 headings in the
README: ensure there is an empty line before and after each "## Input", "##
Output", and "## Example" heading so the headings are surrounded by blank lines
and satisfy markdownlint MD022; update the README.md content where those
headings appear to include a blank line above and below each heading.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ab626c7b-340c-4a4e-98ed-7bc9c04a8e4e
⛔ Files ignored due to path filters (1)
nodes/src/nodes/ml_sklearn/model.pklis excluded by!**/*.pkl
📒 Files selected for processing (7)
README.mdnodes/src/nodes/ml_sklearn/IGlobal.pynodes/src/nodes/ml_sklearn/Instance.pynodes/src/nodes/ml_sklearn/README.mdnodes/src/nodes/ml_sklearn/code.pynodes/src/nodes/ml_sklearn/requirements.txtnodes/src/nodes/ml_sklearn/services.json
Hi @Tejeshyewale, thank you for this contribution. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/ml_sklearn/code.py`:
- Around line 13-33: Add a PEP 257 docstring to the public method run describing
its input and output contract: explain that run(self, input_data) expects a dict
payload with a 'price' key (numeric or numeric-string), describe error cases
returned as {'error': ...} for non-dict payloads, missing or non-numeric price,
and describe successful return shape {'prediction': float} produced from
self.model.predict([[price]]); place this docstring immediately under the def
run(self, input_data): signature so it becomes the function's docstring and keep
it short and clear.
- Around line 4-11: Replace the single-quoted docstring and missing return
annotation in the PreProcessor class: change the class docstring to a
triple-double-quoted docstring ("""ML Sklearn Prediction Node""") and annotate
the constructor signature as def __init__(self, *args, **kwargs) -> None:,
keeping the existing logic that builds model_path and loads self.model with
pickle inside __init__; update only the docstring and the __init__ signature in
the PreProcessor class.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 56a97362-d70a-459d-b774-7e47da96ad3a
📒 Files selected for processing (1)
nodes/src/nodes/ml_sklearn/code.py
…#742) Replace \`ROCKETRIDE_APIKEY: \${{ secrets.ROCKETRIDE_APIKEY }}\` with a literal \`MYAPIKEY\` in the Test step env block. Why this is unblocking the queue -------------------------------- PR #712 set up the env var to fix "No authentication configured" failures in client-python integration tests. Its own inline comment correctly noted that "the secret value itself doesn't matter — it just has to match between server and client in this single CI run." Sourcing it from \`secrets.ROCKETRIDE_APIKEY\` introduced an empty-string failure mode that we hit: 1. The secret was created on 2026-04-27, has not been updated since, and may be set to "" (or rotated to a value the engine no longer accepts). 2. When that happens, the workflow silently expands the expression to \`ROCKETRIDE_APIKEY=""\` for the Test step. 3. The test client reads it via \`os.getenv('ROCKETRIDE_APIKEY', 'MYAPIKEY')\`. \`os.getenv\` returns the empty string when the variable is set-but-empty — NOT the default — so the client authenticates with \`""\`. 4. The server (running in the same step) sees the same empty key and responds AuthenticationException. 5. All 48 client-python integration tests fail uniformly across Ubuntu, Windows, and macOS (which is what's been happening on develop's most recent runs and on PRs #715, #728, #738). Using a literal value eliminates the entire failure mode without changing observable behaviour: the value still isn't a secret (the inline comment was always explicit on this), it never leaves the runner, and it matches the "MYAPIKEY" dev key the engine already recognises elsewhere in the codebase (\`.env.template\`). Together with #734 (the sequential test execution flag, already on develop) this should clear both failure modes that have been blocking PRs since yesterday. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Hi Thanks! |
There was a problem hiding this comment.
Thanks @Tejeshyewale for the contribution and the iteration on CI feedback.
The PR looks like a stub at this point — the process function does nothing, and services.json is invalid.
Review is paused until there is proof it actually works end-to-end (a sample model + a passing pipeline run, or a clear rewrite as a scaffolding example). Also, please add tests.
Converting it to Draft
|
Hi @asclearuc, I've fixed the process function and services.json schema. Input lane: text → Expected output: answers → Result: empty Could you point me to an existing node that:
I want to follow the exact same pattern. Thanks! |
New Feature: ML Sklearn Prediction Node
This PR introduces a new node for performing predictions using a trained scikit-learn model.
Changes:
ml_sklearnundersrc/nodes/model.pkl)Purpose:
This adds basic ML inference capability to RocketRide pipelines and provides a foundation for future ML integrations.
Type
feature
Testing
./builder testpassesChecklist
Linked Issue
Fixes #0
Summary by CodeRabbit
New Features
Documentation
Chores