Skip to content

Added ML Sklearn Prediction Node#715

Draft
Tejeshyewale wants to merge 59 commits intorocketride-org:developfrom
Tejeshyewale:develop
Draft

Added ML Sklearn Prediction Node#715
Tejeshyewale wants to merge 59 commits intorocketride-org:developfrom
Tejeshyewale:develop

Conversation

@Tejeshyewale
Copy link
Copy Markdown
Contributor

@Tejeshyewale Tejeshyewale commented Apr 28, 2026

New Feature: ML Sklearn Prediction Node

This PR introduces a new node for performing predictions using a trained scikit-learn model.

Changes:

  • Added a new node ml_sklearn under src/nodes/
  • Implemented prediction logic using a trained sklearn model (model.pkl)
  • Added input validation and error handling
  • Included minimal documentation and requirements

Purpose:

This adds basic ML inference capability to RocketRide pipelines and provides a foundation for future ML integrations.


Type

feature


Testing

  • Tests added or updated
  • Tested locally
  • ./builder test passes

Checklist

  • Commit messages follow conventional commits
  • No secrets or credentials included
  • Wiki updated (not applicable)
  • Breaking changes documented (not applicable)

Linked Issue

Fixes #0

Summary by CodeRabbit

  • New Features

    • Added an ML Sklearn Prediction node that accepts a numeric input (as text) and returns a predicted numeric value (as text). If no model is available or processing fails, the original input is returned unchanged.
  • Documentation

    • Added node README with expected input/output and an example.
  • Chores

    • Added runtime dependency declarations and service manifest for the new ML prediction node.

@github-actions github-actions Bot added docs Documentation module:nodes Python pipeline nodes labels Apr 28, 2026
@github-actions
Copy link
Copy Markdown

No description provided.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4b1712ea-802e-4ad8-a2af-7da9a911f9a2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new ML sklearn prediction node: a PreProcessor that loads a pickled model and predicts from a single input value, an IGlobal lifecycle wrapper to create/release the preprocessor, an Instance that delegates processing, plus service manifest, requirements, and README.

Changes

Cohort / File(s) Summary
Lifecycle wrappers
nodes/src/nodes/ml_sklearn/IGlobal.py, nodes/src/nodes/ml_sklearn/Instance.py
Adds IGlobal to read endpoint global config and manage a PreProcessor via beginGlobal()/endGlobal(). Adds Instance which fetches IGlobal.preprocessor and delegates process(text), returning original text on missing preprocessor or errors.
Preprocessor implementation
nodes/src/nodes/ml_sklearn/code.py
Adds PreProcessor that attempts to load model.pkl at construction. process(text) returns input unchanged if model missing or input None; otherwise parses text to float, calls model.predict([[value]]), and returns the first prediction as a string; exceptions yield the original text.
Manifest, deps, docs
nodes/src/nodes/ml_sklearn/services.json, nodes/src/nodes/ml_sklearn/requirements.txt, nodes/src/nodes/ml_sklearn/README.md
Adds node service manifest for "ML Sklearn Prediction", a requirements.txt (scikit-learn >=1.0, numpy >=1.21), and a README describing expected input/output and an example.

Sequence Diagram

sequenceDiagram
    participant System
    participant IGlobal
    participant PreProcessor
    participant Instance

    System->>IGlobal: beginGlobal()
    IGlobal->>PreProcessor: instantiate (load model.pkl)
    PreProcessor-->>IGlobal: ready (model set or None)

    System->>Instance: process(text)
    Instance->>IGlobal: request preprocessor
    IGlobal-->>Instance: return preprocessor or None
    alt preprocessor present
        Instance->>PreProcessor: process(text)
        PreProcessor-->>Instance: prediction (string) or original text on error
        Instance-->>System: return result
    else no preprocessor
        Instance-->>System: return original text
    end

    System->>IGlobal: endGlobal()
    IGlobal->>PreProcessor: release (set to None)
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested Reviewers

  • jmaionchi
  • stepmikhaylov
  • Rod-Christensen

Poem

🐇 I peek where pickles sleep in rows,
I wake a model, hush—then pose,
A number hops in, prediction hops out,
I twirl a carrot, give a shout—🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: introduction of a new ML Sklearn Prediction Node. It directly reflects the primary objective of adding ML inference capability to RocketRide.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/ml_sklearn/code.py`:
- Around line 23-26: Tighten numeric conversion by catching both ValueError and
TypeError when calling float(price) (the try/except around the price conversion)
and return a clear error including the exception message; also remove the broad
bare except at the later error-handling block (the generic except that currently
swallows all exceptions around the prediction/path) and either catch specific
exceptions you expect or propagate the exception (or return its message) so
actionable failure causes are not hidden. Ensure the two updated paths reference
the existing price conversion block and the existing generic except block so
behavior is explicit and debuggable.
- Around line 4-36: Update the Node class to follow repository Python style:
replace double-quoted strings with single quotes (e.g., 'model.pkl', 'Input must
be a dictionary', 'Missing \'price\'', 'Price must be a number', 'Prediction
failed'), add PEP 257 docstrings for the class and its methods (__init__ and
run) explaining purpose and parameters/returns, and ensure the code conforms to
ruff/formatting for Python 3.10+ (e.g., minimal exception handling should
capture the exception as e if logging is needed, keep type-safe conversions and
returns unchanged); key identifiers to update: class Node, __init__, run,
model_path, self.model, and prediction.
- Around line 7-9: The current use of pickle.load to populate self.model from
model_path is unsafe; replace it with a safe loading approach: either verify the
model file integrity (validate a bundled hash or signature) before loading and
use a restricted Unpickler that overrides find_class to whitelist allowed
classes, or convert the model to a safer format (e.g., joblib or ONNX) and call
the corresponding safe loader (e.g., joblib.load or ONNX runtime load) instead
of pickle.load; update the code that sets self.model and the logic around
model_path to perform integrity checks and use the restricted loader or new
format loader.

In `@nodes/src/nodes/ml_sklearn/IGlobal.py`:
- Around line 38-44: beginGlobal() imports PreProcessor from .code but code.py
defines Node, so the import will fail; update beginGlobal() to import the
correct symbol or adapt to the existing API: replace "from .code import
PreProcessor" with the actual exported class/function (e.g., Node) or add a
PreProcessor wrapper in code.py, then construct the preprocessor using the
correct constructor signature; ensure references inside beginGlobal() that call
PreProcessor(self.glb.logicalType, self.glb.connConfig, bag) are updated to
match the chosen symbol's parameters (or add a compat constructor) so
self.preprocessor is assigned without runtime import errors.

In `@nodes/src/nodes/ml_sklearn/Instance.py`:
- Line 49: The warning text in Instance.py referencing unsupported source-code
languages is incorrect for this sklearn prediction node; update the message in
the warning call that uses self.instance.currentObject.path (inside the Instance
class / relevant method) to reflect that the file could not be processed as a
scikit-learn model or supported model artifact (remove the repeated "typescript"
and remove language-specific wording), e.g. say the file does not appear to be a
valid sklearn model or supported model format and include the path via
self.instance.currentObject.path.
- Around line 63-79: The tableId handling is wrong: replace the no-op
self.tableId = self.tableId and ensure metadata.tableId is set on each document
before appending so emitted documents carry the correct table id. Specifically,
when isTable is True, set metadata.tableId = self.tableId on the document (not
after appending) then append document and finally increment self.tableId; when
not isTable, set metadata.tableId = 0 on the document before appending. Update
the logic around documents.append(document), metadata.tableId, and self.tableId
to follow this order (use the existing symbols self.tableId, metadata.tableId,
document, documents, isTable).
- Line 83: Rename the parameter named `object` in the Instance.open method to
`obj` to avoid shadowing the built-in and follow the repo lifecycle bridge
convention; update the method signature `def open(self, obj: Entry):` and
replace all references to `object` inside the `open` method body (and any
overrides/call sites that pass a keyword named `object`) to use `obj` instead,
keeping the same type annotation `Entry`.

In `@nodes/src/nodes/ml_sklearn/README.md`:
- Around line 5-11: Add required blank lines around the level-2 headings in the
README: ensure there is an empty line before and after each "## Input", "##
Output", and "## Example" heading so the headings are surrounded by blank lines
and satisfy markdownlint MD022; update the README.md content where those
headings appear to include a blank line above and below each heading.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ab626c7b-340c-4a4e-98ed-7bc9c04a8e4e

📥 Commits

Reviewing files that changed from the base of the PR and between 4846445 and c587ada.

⛔ Files ignored due to path filters (1)
  • nodes/src/nodes/ml_sklearn/model.pkl is excluded by !**/*.pkl
📒 Files selected for processing (7)
  • README.md
  • nodes/src/nodes/ml_sklearn/IGlobal.py
  • nodes/src/nodes/ml_sklearn/Instance.py
  • nodes/src/nodes/ml_sklearn/README.md
  • nodes/src/nodes/ml_sklearn/code.py
  • nodes/src/nodes/ml_sklearn/requirements.txt
  • nodes/src/nodes/ml_sklearn/services.json

Comment thread nodes/src/nodes/ml_sklearn/code.py Outdated
Comment thread nodes/src/nodes/ml_sklearn/code.py Outdated
Comment thread nodes/src/nodes/ml_sklearn/code.py Outdated
Comment thread nodes/src/nodes/ml_sklearn/IGlobal.py Outdated
Comment thread nodes/src/nodes/ml_sklearn/Instance.py Outdated
Comment thread nodes/src/nodes/ml_sklearn/Instance.py Outdated
Comment thread nodes/src/nodes/ml_sklearn/Instance.py Outdated
Comment thread nodes/src/nodes/ml_sklearn/README.md
@stepmikhaylov
Copy link
Copy Markdown
Collaborator

Hi maintainers

It looks like CI is failing due to authentication errors (Not authenticated) and a Windows environment dependency issue (pywintypes312.dll access denied).

From my side, the changes are limited to adding a new ML node under src/nodes/ml_sklearn/ and should not affect authentication or client connection logic.

Please let me know if any changes are needed from my side. Happy to fix anything required.

Thanks!

Hi @Tejeshyewale, thank you for this contribution.
Your branch is 10 commits behind and misses important CI/CD updates.
In addition, I found the CodeRattit's concerns are really relevant.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/ml_sklearn/code.py`:
- Around line 13-33: Add a PEP 257 docstring to the public method run describing
its input and output contract: explain that run(self, input_data) expects a dict
payload with a 'price' key (numeric or numeric-string), describe error cases
returned as {'error': ...} for non-dict payloads, missing or non-numeric price,
and describe successful return shape {'prediction': float} produced from
self.model.predict([[price]]); place this docstring immediately under the def
run(self, input_data): signature so it becomes the function's docstring and keep
it short and clear.
- Around line 4-11: Replace the single-quoted docstring and missing return
annotation in the PreProcessor class: change the class docstring to a
triple-double-quoted docstring ("""ML Sklearn Prediction Node""") and annotate
the constructor signature as def __init__(self, *args, **kwargs) -> None:,
keeping the existing logic that builds model_path and loads self.model with
pickle inside __init__; update only the docstring and the __init__ signature in
the PreProcessor class.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 56a97362-d70a-459d-b774-7e47da96ad3a

📥 Commits

Reviewing files that changed from the base of the PR and between c587ada and 8e676e5.

📒 Files selected for processing (1)
  • nodes/src/nodes/ml_sklearn/code.py

Comment thread nodes/src/nodes/ml_sklearn/code.py Outdated
Comment thread nodes/src/nodes/ml_sklearn/code.py Outdated
kwit75 added a commit that referenced this pull request May 1, 2026
…#742)

Replace \`ROCKETRIDE_APIKEY: \${{ secrets.ROCKETRIDE_APIKEY }}\` with a
literal \`MYAPIKEY\` in the Test step env block.

Why this is unblocking the queue
--------------------------------
PR #712 set up the env var to fix "No authentication configured" failures
in client-python integration tests. Its own inline comment correctly
noted that "the secret value itself doesn't matter — it just has to
match between server and client in this single CI run." Sourcing it
from \`secrets.ROCKETRIDE_APIKEY\` introduced an empty-string failure
mode that we hit:

  1. The secret was created on 2026-04-27, has not been updated since,
     and may be set to "" (or rotated to a value the engine no longer
     accepts).
  2. When that happens, the workflow silently expands the expression to
     \`ROCKETRIDE_APIKEY=""\` for the Test step.
  3. The test client reads it via \`os.getenv('ROCKETRIDE_APIKEY',
     'MYAPIKEY')\`. \`os.getenv\` returns the empty string when the
     variable is set-but-empty — NOT the default — so the client
     authenticates with \`""\`.
  4. The server (running in the same step) sees the same empty key and
     responds AuthenticationException.
  5. All 48 client-python integration tests fail uniformly across
     Ubuntu, Windows, and macOS (which is what's been happening on
     develop's most recent runs and on PRs #715, #728, #738).

Using a literal value eliminates the entire failure mode without
changing observable behaviour: the value still isn't a secret (the
inline comment was always explicit on this), it never leaves the runner,
and it matches the "MYAPIKEY" dev key the engine already recognises
elsewhere in the codebase (\`.env.template\`).

Together with #734 (the sequential test execution flag, already on
develop) this should clear both failure modes that have been blocking
PRs since yesterday.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Tejeshyewale
Copy link
Copy Markdown
Contributor Author

Hi
All CI checks are now passing.
I’ve aligned the implementation with the existing node architecture and ensured compatibility with the pipeline.
Would love your review and feedback when you get a chance.

Thanks!

Copy link
Copy Markdown
Collaborator

@asclearuc asclearuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Tejeshyewale for the contribution and the iteration on CI feedback.

The PR looks like a stub at this point — the process function does nothing, and services.json is invalid.

Review is paused until there is proof it actually works end-to-end (a sample model + a passing pipeline run, or a clear rewrite as a scaffolding example). Also, please add tests.

Converting it to Draft

@asclearuc asclearuc marked this pull request as draft May 4, 2026 20:07
@Tejeshyewale
Copy link
Copy Markdown
Contributor Author

Hi @asclearuc,

I've fixed the process function and services.json schema.
However, CI is still failing with:

Input lane: text → Expected output: answers → Result: empty

Could you point me to an existing node that:

  1. Takes answers lane as input
  2. Outputs to answers lane

I want to follow the exact same pattern. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation module:nodes Python pipeline nodes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants