Added ML Sklearn Prediction Node by Tejeshyewale · Pull Request #715 · rocketride-org/rocketride-server

Tejeshyewale · 2026-04-28T18:53:04Z

New Feature: ML Sklearn Prediction Node

This PR introduces a new node for performing predictions using a trained scikit-learn model.

Changes:

Added a new node ml_sklearn under src/nodes/
Implemented prediction logic using a trained sklearn model (model.pkl)
Added input validation and error handling
Included minimal documentation and requirements

Purpose:

This adds basic ML inference capability to RocketRide pipelines and provides a foundation for future ML integrations.

Type

feature

Testing

Tests added or updated
Tested locally
./builder test passes

Checklist

Commit messages follow conventional commits
No secrets or credentials included
Wiki updated (not applicable)
Breaking changes documented (not applicable)

Linked Issue

Fixes #0

Summary by CodeRabbit

New Features
- Added an ML Sklearn Prediction node that accepts a numeric input (as text) and returns a predicted numeric value (as text). If no model is available or processing fails, the original input is returned unchanged.
Documentation
- Added node README with expected input/output and an example.
Chores
- Added runtime dependency declarations and service manifest for the new ML prediction node.

Fixed wording, numbering, and documentation clarity in README.

github-actions · 2026-04-28T18:53:33Z

No description provided.

coderabbitai · 2026-04-28T18:53:35Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4b1712ea-802e-4ad8-a2af-7da9a911f9a2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new ML sklearn prediction node: a PreProcessor that loads a pickled model and predicts from a single input value, an IGlobal lifecycle wrapper to create/release the preprocessor, an Instance that delegates processing, plus service manifest, requirements, and README.

Changes

Cohort / File(s)	Summary
Lifecycle wrappers `nodes/src/nodes/ml_sklearn/IGlobal.py`, `nodes/src/nodes/ml_sklearn/Instance.py`	Adds `IGlobal` to read endpoint global config and manage a `PreProcessor` via `beginGlobal()`/`endGlobal()`. Adds `Instance` which fetches `IGlobal.preprocessor` and delegates `process(text)`, returning original `text` on missing preprocessor or errors.
Preprocessor implementation `nodes/src/nodes/ml_sklearn/code.py`	Adds `PreProcessor` that attempts to load `model.pkl` at construction. `process(text)` returns input unchanged if model missing or input None; otherwise parses `text` to float, calls `model.predict([[value]])`, and returns the first prediction as a string; exceptions yield the original `text`.
Manifest, deps, docs `nodes/src/nodes/ml_sklearn/services.json`, `nodes/src/nodes/ml_sklearn/requirements.txt`, `nodes/src/nodes/ml_sklearn/README.md`	Adds node service manifest for "ML Sklearn Prediction", a `requirements.txt` (`scikit-learn >=1.0`, `numpy >=1.21`), and a README describing expected input/output and an example.

Sequence Diagram

sequenceDiagram
    participant System
    participant IGlobal
    participant PreProcessor
    participant Instance

    System->>IGlobal: beginGlobal()
    IGlobal->>PreProcessor: instantiate (load model.pkl)
    PreProcessor-->>IGlobal: ready (model set or None)

    System->>Instance: process(text)
    Instance->>IGlobal: request preprocessor
    IGlobal-->>Instance: return preprocessor or None
    alt preprocessor present
        Instance->>PreProcessor: process(text)
        PreProcessor-->>Instance: prediction (string) or original text on error
        Instance-->>System: return result
    else no preprocessor
        Instance-->>System: return original text
    end

    System->>IGlobal: endGlobal()
    IGlobal->>PreProcessor: release (set to None)

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested Reviewers

jmaionchi
stepmikhaylov
Rod-Christensen

Poem

🐇 I peek where pickles sleep in rows,
I wake a model, hush—then pose,
A number hops in, prediction hops out,
I twirl a carrot, give a shout—🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: introduction of a new ML Sklearn Prediction Node. It directly reflects the primary objective of adding ML inference capability to RocketRide.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/ml_sklearn/code.py`:
- Around line 23-26: Tighten numeric conversion by catching both ValueError and
TypeError when calling float(price) (the try/except around the price conversion)
and return a clear error including the exception message; also remove the broad
bare except at the later error-handling block (the generic except that currently
swallows all exceptions around the prediction/path) and either catch specific
exceptions you expect or propagate the exception (or return its message) so
actionable failure causes are not hidden. Ensure the two updated paths reference
the existing price conversion block and the existing generic except block so
behavior is explicit and debuggable.
- Around line 4-36: Update the Node class to follow repository Python style:
replace double-quoted strings with single quotes (e.g., 'model.pkl', 'Input must
be a dictionary', 'Missing \'price\'', 'Price must be a number', 'Prediction
failed'), add PEP 257 docstrings for the class and its methods (__init__ and
run) explaining purpose and parameters/returns, and ensure the code conforms to
ruff/formatting for Python 3.10+ (e.g., minimal exception handling should
capture the exception as e if logging is needed, keep type-safe conversions and
returns unchanged); key identifiers to update: class Node, __init__, run,
model_path, self.model, and prediction.
- Around line 7-9: The current use of pickle.load to populate self.model from
model_path is unsafe; replace it with a safe loading approach: either verify the
model file integrity (validate a bundled hash or signature) before loading and
use a restricted Unpickler that overrides find_class to whitelist allowed
classes, or convert the model to a safer format (e.g., joblib or ONNX) and call
the corresponding safe loader (e.g., joblib.load or ONNX runtime load) instead
of pickle.load; update the code that sets self.model and the logic around
model_path to perform integrity checks and use the restricted loader or new
format loader.

In `@nodes/src/nodes/ml_sklearn/IGlobal.py`:
- Around line 38-44: beginGlobal() imports PreProcessor from .code but code.py
defines Node, so the import will fail; update beginGlobal() to import the
correct symbol or adapt to the existing API: replace "from .code import
PreProcessor" with the actual exported class/function (e.g., Node) or add a
PreProcessor wrapper in code.py, then construct the preprocessor using the
correct constructor signature; ensure references inside beginGlobal() that call
PreProcessor(self.glb.logicalType, self.glb.connConfig, bag) are updated to
match the chosen symbol's parameters (or add a compat constructor) so
self.preprocessor is assigned without runtime import errors.

In `@nodes/src/nodes/ml_sklearn/Instance.py`:
- Line 49: The warning text in Instance.py referencing unsupported source-code
languages is incorrect for this sklearn prediction node; update the message in
the warning call that uses self.instance.currentObject.path (inside the Instance
class / relevant method) to reflect that the file could not be processed as a
scikit-learn model or supported model artifact (remove the repeated "typescript"
and remove language-specific wording), e.g. say the file does not appear to be a
valid sklearn model or supported model format and include the path via
self.instance.currentObject.path.
- Around line 63-79: The tableId handling is wrong: replace the no-op
self.tableId = self.tableId and ensure metadata.tableId is set on each document
before appending so emitted documents carry the correct table id. Specifically,
when isTable is True, set metadata.tableId = self.tableId on the document (not
after appending) then append document and finally increment self.tableId; when
not isTable, set metadata.tableId = 0 on the document before appending. Update
the logic around documents.append(document), metadata.tableId, and self.tableId
to follow this order (use the existing symbols self.tableId, metadata.tableId,
document, documents, isTable).
- Line 83: Rename the parameter named `object` in the Instance.open method to
`obj` to avoid shadowing the built-in and follow the repo lifecycle bridge
convention; update the method signature `def open(self, obj: Entry):` and
replace all references to `object` inside the `open` method body (and any
overrides/call sites that pass a keyword named `object`) to use `obj` instead,
keeping the same type annotation `Entry`.

In `@nodes/src/nodes/ml_sklearn/README.md`:
- Around line 5-11: Add required blank lines around the level-2 headings in the
README: ensure there is an empty line before and after each "## Input", "##
Output", and "## Example" heading so the headings are surrounded by blank lines
and satisfy markdownlint MD022; update the README.md content where those
headings appear to include a blank line above and below each heading.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ab626c7b-340c-4a4e-98ed-7bc9c04a8e4e

📥 Commits

Reviewing files that changed from the base of the PR and between 4846445 and c587ada.

⛔ Files ignored due to path filters (1)

nodes/src/nodes/ml_sklearn/model.pkl is excluded by !**/*.pkl

📒 Files selected for processing (7)

README.md
nodes/src/nodes/ml_sklearn/IGlobal.py
nodes/src/nodes/ml_sklearn/Instance.py
nodes/src/nodes/ml_sklearn/README.md
nodes/src/nodes/ml_sklearn/code.py
nodes/src/nodes/ml_sklearn/requirements.txt
nodes/src/nodes/ml_sklearn/services.json

stepmikhaylov · 2026-04-29T14:04:38Z

Hi maintainers

It looks like CI is failing due to authentication errors (Not authenticated) and a Windows environment dependency issue (pywintypes312.dll access denied).

From my side, the changes are limited to adding a new ML node under src/nodes/ml_sklearn/ and should not affect authentication or client connection logic.

Please let me know if any changes are needed from my side. Happy to fix anything required.

Thanks!

Hi @Tejeshyewale, thank you for this contribution.
Your branch is 10 commits behind and misses important CI/CD updates.
In addition, I found the CodeRattit's concerns are really relevant.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/ml_sklearn/code.py`:
- Around line 13-33: Add a PEP 257 docstring to the public method run describing
its input and output contract: explain that run(self, input_data) expects a dict
payload with a 'price' key (numeric or numeric-string), describe error cases
returned as {'error': ...} for non-dict payloads, missing or non-numeric price,
and describe successful return shape {'prediction': float} produced from
self.model.predict([[price]]); place this docstring immediately under the def
run(self, input_data): signature so it becomes the function's docstring and keep
it short and clear.
- Around line 4-11: Replace the single-quoted docstring and missing return
annotation in the PreProcessor class: change the class docstring to a
triple-double-quoted docstring ("""ML Sklearn Prediction Node""") and annotate
the constructor signature as def __init__(self, *args, **kwargs) -> None:,
keeping the existing logic that builds model_path and loads self.model with
pickle inside __init__; update only the docstring and the __init__ signature in
the PreProcessor class.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 56a97362-d70a-459d-b774-7e47da96ad3a

📥 Commits

Reviewing files that changed from the base of the PR and between c587ada and 8e676e5.

📒 Files selected for processing (1)

nodes/src/nodes/ml_sklearn/code.py

…#742) Replace \`ROCKETRIDE_APIKEY: \${{ secrets.ROCKETRIDE_APIKEY }}\` with a literal \`MYAPIKEY\` in the Test step env block. Why this is unblocking the queue -------------------------------- PR #712 set up the env var to fix "No authentication configured" failures in client-python integration tests. Its own inline comment correctly noted that "the secret value itself doesn't matter — it just has to match between server and client in this single CI run." Sourcing it from \`secrets.ROCKETRIDE_APIKEY\` introduced an empty-string failure mode that we hit: 1. The secret was created on 2026-04-27, has not been updated since, and may be set to "" (or rotated to a value the engine no longer accepts). 2. When that happens, the workflow silently expands the expression to \`ROCKETRIDE_APIKEY=""\` for the Test step. 3. The test client reads it via \`os.getenv('ROCKETRIDE_APIKEY', 'MYAPIKEY')\`. \`os.getenv\` returns the empty string when the variable is set-but-empty — NOT the default — so the client authenticates with \`""\`. 4. The server (running in the same step) sees the same empty key and responds AuthenticationException. 5. All 48 client-python integration tests fail uniformly across Ubuntu, Windows, and macOS (which is what's been happening on develop's most recent runs and on PRs #715, #728, #738). Using a literal value eliminates the entire failure mode without changing observable behaviour: the value still isn't a secret (the inline comment was always explicit on this), it never leaves the runner, and it matches the "MYAPIKEY" dev key the engine already recognises elsewhere in the codebase (\`.env.template\`). Together with #734 (the sequential test execution flag, already on develop) this should clear both failure modes that have been blocking PRs since yesterday. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tejeshyewale · 2026-05-02T05:11:06Z

Hi
All CI checks are now passing.
I’ve aligned the implementation with the existing node architecture and ensured compatibility with the pipeline.
Would love your review and feedback when you get a chance.

Thanks!

asclearuc

Thanks @Tejeshyewale for the contribution and the iteration on CI feedback.

The PR looks like a stub at this point — the process function does nothing, and services.json is invalid.

Review is paused until there is proof it actually works end-to-end (a sample model + a passing pipeline run, or a clear rewrite as a scaffolding example). Also, please add tests.

Converting it to Draft

…tests

Tejeshyewale · 2026-05-06T06:10:50Z

Hi @asclearuc,

I've fixed the process function and services.json schema.
However, CI is still failing with:

Input lane: text → Expected output: answers → Result: empty

Could you point me to an existing node that:

Takes answers lane as input
Outputs to answers lane

I want to follow the exact same pattern. Thanks!

Tejeshyewale added 17 commits April 26, 2026 15:09

Fix typo in pipeline source node description

f4f7aa5

docs: improve README clarity and deployment instructions

b22e844

Fixed wording, numbering, and documentation clarity in README.

Update README.md

319fca1

Create README.md

7ce10e3

Create code.py

90fac4d

Create IGlobal.py

b0598a2

Create IInstance.py

ca2b870

Create services.json

5862bd8

Create requirements.txt

b33eff4

Update code.py

be13299

Add files via upload

4d88f21

Rename IInstance.py to Instance.py

48a784f

Update services.json

7e6963d

Update requirements.txt

5952843

Update README.md

c667348

Update requirements.txt

12c00e2

Update code.py

c587ada

Tejeshyewale requested review from Rod-Christensen, jmaionchi and stepmikhaylov as code owners April 28, 2026 18:53

github-actions Bot added docs Documentation module:nodes Python pipeline nodes labels Apr 28, 2026

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Tejeshyewale added 3 commits April 30, 2026 11:59

Merge branch 'rocketride-org:develop' into develop

bb482f0

Fix PreProcessor structure and improve error handling

8e676e5

Fix PreProcessor node structure and improve error handling

50a0bd4

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread nodes/src/nodes/ml_sklearn/code.py Outdated

Comment thread nodes/src/nodes/ml_sklearn/code.py Outdated

Tejeshyewale and others added 7 commits May 1, 2026 10:46

Create __init__.py

cc8840e

Update requirements.txt

1f33c2f

Update services.json

4d29752

Rename Instance.py to IInstance.py

9ad5458

Delete nodes/src/nodes/ml_sklearn/model.pkl

282a102

Merge branch 'rocketride-org:develop' into develop

718b5ce

Merge branch 'develop' into develop

46d6428

kwit75 mentioned this pull request May 1, 2026

fix(ci): use literal ROCKETRIDE_APIKEY in Test step instead of secret #742

Merged

3 tasks

kwit75 and others added 2 commits May 1, 2026 14:31

Merge branch 'develop' into develop

83abe32

Merge branch 'rocketride-org:develop' into develop

3d91916

asclearuc requested changes May 4, 2026

View reviewed changes

asclearuc marked this pull request as draft May 4, 2026 20:07

Tejeshyewale added 15 commits May 5, 2026 16:48

Update code.py

2f66442

Update IGlobal.py

7039924

Update IInstance.py

dc2ab2f

Update services.json

f36b225

Update requirements.txt

0e4e36a

Merge branch 'rocketride-org:develop' into develop

c20d8ed

fix: ml_sklearn node - working inference, fixed services.json, added …

5db3751

…tests

fix: ruff linting issues in ml_sklearn node

6c53920

fix: ml_sklearn - forward input unchanged when no model loaded

40a4ee5

fix: ml_sklearn - add writeText handler for text lane input

d359ae4

fix: ml_sklearn - ruff format fixes

9671737

fix: ml_sklearn - writeText takes plain string not question object

57943c9

fix: ml_sklearn - use Answer object with getText/setText pattern

26c9ac7

fix: ml_sklearn - use answers lane for input and test

f61ba71

fix: ml_sklearn - fix test case format to match guardrails pattern

6894f7b

Conversation

Tejeshyewale commented Apr 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Feature: ML Sklearn Prediction Node

Changes:

Purpose:

Type

Testing

Checklist

Linked Issue

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Suggested Reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stepmikhaylov commented Apr 29, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Tejeshyewale commented May 2, 2026

Uh oh!

asclearuc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tejeshyewale commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Tejeshyewale commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

asclearuc left a comment •

edited

Loading