Skip to content

chore(python): apply ruff format and enforce in CI#744

Closed
TrishaReddygari wants to merge 3 commits intorocketride-org:developfrom
TrishaReddygari:chore/RR-432-ruff-format-and-ci
Closed

chore(python): apply ruff format and enforce in CI#744
TrishaReddygari wants to merge 3 commits intorocketride-org:developfrom
TrishaReddygari:chore/RR-432-ruff-format-and-ci

Conversation

@TrishaReddygari
Copy link
Copy Markdown

@TrishaReddygari TrishaReddygari commented May 3, 2026

Summary

Brings the repo into compliance with the formatting acceptance criteria in #432:

  1. Mechanical ruff format pass across all Python files (93 of 720 reformatted; whitespace + quote normalization only, no semantic changes).
  2. New lint-python job in .github/workflows/ci.yml that runs ruff format --check on every PR and push, pinned to ruff 0.15.12, wired into the ci-ok gatekeeper so format drift blocks merges.
  3. New "Formatting" section in CONTRIBUTING.md covering install, local commands, and CI behavior.

The [tool.ruff] config in pyproject.toml is unchanged — it was already set up correctly; only the format pass and CI enforcement were missing.

Scope notes

Full lint enforcement (ruff check, distinct from ruff format) is deferred to a follow-up PR. There are 480 pre-existing lint violations (347 single-quote, 49 incorrect-blank-line-after-class, 36 undocumented __init__, 26 unused imports, etc.). 400 are auto-fixable, 80 are not. Fixing them in this PR would mix mechanical formatting with code-touching changes that warrant their own review. Happy to follow up with a separate PR if maintainers would like.

Also worth a heads-up: ruff reports 3 F821 (undefined name) warnings in packages/ai/src/ai/common/models/transformers/sentence_transformers.py for np.ndarray annotations. They're false positives — the file uses from __future__ import annotations, so the names aren't evaluated at runtime. No action needed here, just flagging it.

Testing

  • ruff format --check . → 720 files clean (was: 93 would reformat)
  • YAML parse check on .github/workflows/ci.yml → valid; lint-python is in ci-ok's needs: array
  • Commit-by-commit structure for easy review: format pass / CI / docs

Breaking changes

None.

Fixes #432

Summary by CodeRabbit

  • New Features

    • Added Python code formatting verification to CI pipeline
    • Enhanced testing framework with new validation and runner components
    • Implemented agent utility helpers for text processing and payload normalization
  • Documentation

    • Updated code style guidelines and contributing documentation
  • Tests

    • Expanded test mock implementations for vector databases, LLMs, and cloud services
    • Improved test infrastructure and discovery mechanisms
  • Style

    • Applied consistent code formatting and quoting standards throughout codebase

Mechanical formatting pass using ruff 0.15.12 against the existing
[tool.ruff] config in pyproject.toml. 93 of 720 Python files reformatted;
changes are limited to whitespace and quote-style normalization, no
semantic edits.

This is the one-time baseline pass called for in the issue. Subsequent
commits in this PR add CI enforcement so formatting drift can no longer
re-enter the codebase.

Refs rocketride-org#432
Adds a new lint-python job to the CI workflow that installs ruff at the
same pinned version used for the baseline format pass (0.15.12) and runs
ruff format --check against the repo. Wired into the ci-ok gatekeeper so
formatting drift blocks merges.

The job does not depend on init or build, runs on ubuntu-latest with a
5-minute timeout, and skips on scheduled runs (matching the existing
build job's exclusion). Pinning ruff is intentional: an unpinned version
would let ruff releases silently change "passing" formatting and surprise
PR authors.

Lint enforcement (ruff check, separate from format) is deferred to a
follow-up because there are 480 pre-existing lint violations in the
codebase that warrant their own review and fix.

Refs rocketride-org#432
New 'Formatting' section documents the ruff version pin, install
command, format/check commands, and what CI enforces. The existing
single-quotes note in the Python style section now links to it.

Refs rocketride-org#432
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 3, 2026

📝 Walkthrough

Walkthrough

This PR applies the ruff code formatter across the entire Python codebase to establish a consistent formatting baseline, adds a CI job to enforce formatting on future PRs, and updates project documentation. Changes include a new lint-python workflow job, ruff configuration in pyproject.toml, and extensive formatting updates across 80+ Python files.

Changes

Code Formatting & Linting Infrastructure

Layer / File(s) Summary
CI Workflow Configuration
.github/workflows/ci.yml
New lint-python job runs ruff format --check . on non-scheduled events; ci-ok gatekeeper now depends on lint-python and includes its result in failure detection and printed summary.
Linting Configuration
pyproject.toml (implicit from changes)
Ruff configuration applied across the project (inferred from formatting results; specific config details managed via project configuration).
Project Documentation
CONTRIBUTING.md
Python code style guidelines updated to reference ruff as the project formatter; link added to formatting section for "single quotes" rule.
Core Python Formatting
nodes/src/nodes/*/ (50+ files), packages/ai/src/ai/*/ (10+ files), packages/client-python/src/rocketride/*/ (5+ files), packages/client-mcp/ (2+ files), scripts/, test/, testdata/
Ruff formatting applied consistently across all Python modules: string literal quoting normalized (double ↔ single quotes), line wrapping adjusted for readability, whitespace/blank-line standardization, import organization, and dict/list literal reformatting. All changes preserve semantic behavior and control flow.
Test & Mock Infrastructure
nodes/test/framework/, nodes/test/mocks/*/ (20+ files), packages/*/tests/
Ruff formatting applied to test fixtures, mock implementations, and test utilities; new module exports added (ExpectationValidator, NodeTestRunner in test framework; __all__ lists updated in mock submodules for clarity).
Utility & Script Updates
packages/ai/src/ai/common/agent/_internal/utils.py, packages/server/engine-lib/, scripts/check_gpu.py
Helper functions refactored with consistent quoting and string construction; new agent-framework utilities introduced (new_run_id, now_iso, safe_str, normalize_invocation_payload, messages_to_transcript, extract_text, truncate_at_stop_words).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~45 minutes

The PR applies a uniform formatter (ruff) across a large codebase (~80+ files), making individual changes highly repetitive and low-risk. However, the scope is substantial, requiring a reviewer to sample-check that no semantic logic was inadvertently altered, verify CI/configuration correctness, and confirm the formatter settings are project-appropriate.

Possibly related PRs

Suggested labels

ci/cd, docs, formatting

Suggested reviewers

  • jmaionchi
  • stepmikhaylov
  • Rod-Christensen

Poem

🐰 A rabbit's ode to tidy code:

Ruff did comb through every line,
Now all quotes and spaces align,
From node to test, from src to test,
The formatter has done its best! ✨
CI will guard the style so neat,
Making Python code complete.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@github-actions github-actions Bot added docs Documentation ci/cd CI/CD and build system builder labels May 3, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

No description provided.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nodes/test/mocks/psycopg2/__init__.py (1)

179-189: ⚠️ Potential issue | 🟠 Major

Boolean values misclassified as LIMIT candidates in two locations.

In Python, bool is a subclass of int, so isinstance(False, int) returns True. At line 185 in _handle_semantic_search, this causes boolean parameters like False (from isDeleted = %s) to be assigned to limit instead of added to filter_params, breaking WHERE filtering. The same bug exists at line 307 in _extract_limit.

Proposed fixes
-                elif isinstance(p, int) and p < 1000:
+                elif isinstance(p, int) and not isinstance(p, bool) and p < 1000:
                     limit = p
-                if isinstance(p, int) and p < 10000:
+                if isinstance(p, int) and not isinstance(p, bool) and p < 10000:
                     return p
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/test/mocks/psycopg2/__init__.py` around lines 179 - 189, The bug:
boolean values (since bool subclasses int) are being misclassified as LIMIT
candidates and assigned to limit in _handle_semantic_search and _extract_limit;
update the parameter type checks to exclude bools (e.g., check isinstance(p,
bool) first or use type(p) is int / isinstance(p, int) and not isinstance(p,
bool)) so booleans are appended to filter_params instead of overwriting limit;
apply the same change at both occurrences (functions _handle_semantic_search and
_extract_limit) and ensure tests that pass booleans (e.g., isDeleted) now route
to filter_params.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/test/mocks/chromadb/__init__.py`:
- Around line 169-170: In get() the check "if limit and len(results['ids']) >=
limit" treats limit=0 as falsy (no limit) and returns all rows; update get() to
explicitly handle limit being provided: if limit == 0 return an empty result
immediately, otherwise use "if limit is not None and len(results['ids']) >=
limit: break" (or equivalent) so zero is respected and only None means
unlimited. Reference the get() function and the results['ids'] accumulation when
making the change.

In `@nodes/test/mocks/weaviate/__init__.py`:
- Around line 267-270: Sorting uses a truthy check that collapses 0.0 into the
fallback 1.0; update the sort key used on results (the lambda passed to
results.sort in the QueryResult construction) to explicitly check for None
(e.g., use x.metadata.distance if x.metadata.distance is not None else 1.0) so
that valid 0.0 distances are preserved and only missing distances get the high
default.

---

Outside diff comments:
In `@nodes/test/mocks/psycopg2/__init__.py`:
- Around line 179-189: The bug: boolean values (since bool subclasses int) are
being misclassified as LIMIT candidates and assigned to limit in
_handle_semantic_search and _extract_limit; update the parameter type checks to
exclude bools (e.g., check isinstance(p, bool) first or use type(p) is int /
isinstance(p, int) and not isinstance(p, bool)) so booleans are appended to
filter_params instead of overwriting limit; apply the same change at both
occurrences (functions _handle_semantic_search and _extract_limit) and ensure
tests that pass booleans (e.g., isDeleted) now route to filter_params.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 655ef2e3-8431-420f-a635-95e8abba7cdb

📥 Commits

Reviewing files that changed from the base of the PR and between 5d9963f and b56cc47.

📒 Files selected for processing (95)
  • .github/workflows/ci.yml
  • CONTRIBUTING.md
  • nodes/src/nodes/accessibility_describe/accessibility_vision.py
  • nodes/src/nodes/agent_rocketride/IGlobal.py
  • nodes/src/nodes/anonymize/IInstance.py
  • nodes/src/nodes/anonymize/anonymize.py
  • nodes/src/nodes/anonymize/glinerRecognizer.py
  • nodes/src/nodes/audio_transcribe/IGlobal.py
  • nodes/src/nodes/audio_tts/IGlobal.py
  • nodes/src/nodes/db_mysql/IGlobal.py
  • nodes/src/nodes/db_postgres/IGlobal.py
  • nodes/src/nodes/index_search/IEndpoint.py
  • nodes/src/nodes/index_search/IGlobal.py
  • nodes/src/nodes/index_search/IInstance.py
  • nodes/src/nodes/index_search/__init__.py
  • nodes/src/nodes/index_search/constants.py
  • nodes/src/nodes/index_search/elasticsearch_store.py
  • nodes/src/nodes/index_search/opensearch_client.py
  • nodes/src/nodes/library/msgraph/application_permissions.py
  • nodes/src/nodes/library/permissions/utils.py
  • nodes/src/nodes/llamaparse/IInstance.py
  • nodes/src/nodes/llm_openai_api/openai_client.py
  • nodes/src/nodes/memory_internal/memory.py
  • nodes/src/nodes/ner/IGlobal.py
  • nodes/src/nodes/ner/IInstance.py
  • nodes/src/nodes/ocr/IGlobal.py
  • nodes/src/nodes/ocr/IInstance.py
  • nodes/src/nodes/tool_pipe/IInstance.py
  • nodes/src/nodes/vectordb_postgres/vectordb_postgres.py
  • nodes/test/__init__.py
  • nodes/test/framework/__init__.py
  • nodes/test/framework/expectations.py
  • nodes/test/framework/pipeline.py
  • nodes/test/framework/runner.py
  • nodes/test/milvus/test_search_semantic.py
  • nodes/test/mocks/__init__.py
  • nodes/test/mocks/astrapy/__init__.py
  • nodes/test/mocks/astrapy/data_types.py
  • nodes/test/mocks/astrapy/info.py
  • nodes/test/mocks/chromadb/__init__.py
  • nodes/test/mocks/chromadb/config.py
  • nodes/test/mocks/elasticsearch/__init__.py
  • nodes/test/mocks/langchain_anthropic/__init__.py
  • nodes/test/mocks/langchain_aws/__init__.py
  • nodes/test/mocks/langchain_google_vertexai/__init__.py
  • nodes/test/mocks/langchain_openai/__init__.py
  • nodes/test/mocks/langchain_xai/__init__.py
  • nodes/test/mocks/opensearchpy/__init__.py
  • nodes/test/mocks/pgvector/__init__.py
  • nodes/test/mocks/pgvector/psycopg2.py
  • nodes/test/mocks/pinecone/grpc.py
  • nodes/test/mocks/psycopg2/__init__.py
  • nodes/test/mocks/pymilvus/__init__.py
  • nodes/test/mocks/qdrant_client/__init__.py
  • nodes/test/mocks/qdrant_client/conversions/common_types.py
  • nodes/test/mocks/qdrant_client/models.py
  • nodes/test/mocks/weaviate/__init__.py
  • nodes/test/mocks/weaviate/classes/__init__.py
  • nodes/test/mocks/weaviate/classes/config.py
  • nodes/test/mocks/weaviate/classes/init.py
  • nodes/test/mocks/weaviate/classes/query.py
  • nodes/test/mocks/weaviate/client.py
  • nodes/test/mocks/weaviate/collections/__init__.py
  • nodes/test/mocks/weaviate/collections/collection.py
  • nodes/test/mocks/weaviate/util.py
  • nodes/test/test_contracts.py
  • nodes/test/test_dynamic.py
  • nodes/test/test_embedding_image.py
  • packages/ai/src/ai/__init__.py
  • packages/ai/src/ai/common/agent/_internal/utils.py
  • packages/ai/src/ai/common/dap/transport_tcpip.py
  • packages/ai/src/ai/common/database/db_global_base.py
  • packages/ai/src/ai/common/models/audio/kokoro_loader.py
  • packages/ai/src/ai/common/models/gliner/__init__.py
  • packages/ai/src/ai/common/models/vision/vision.py
  • packages/ai/src/ai/common/tools.py
  • packages/ai/src/ai/node.py
  • packages/ai/tests/ai/modules/profiler/test_auth.py
  • packages/ai/tests/ai/modules/task/test_env_var_exfil.py
  • packages/ai/tests/ai/modules/task/test_temp_file_security.py
  • packages/ai/tests/ai/web/test_server.py
  • packages/client-mcp/tests/conftest.py
  • packages/client-mcp/tests/test_server.py
  • packages/client-mcp/tests/test_tools.py
  • packages/client-python/src/rocketride/cli/__init__.py
  • packages/client-python/src/rocketride/cli/monitors/status.py
  • packages/client-python/src/rocketride/schema/__init__.py
  • packages/client-python/src/rocketride/schema/question.py
  • packages/client-python/src/rocketride/types/data.py
  • packages/client-python/tests/echo_pipeline.py
  • packages/client-python/tests/embedding_image_pipeline.py
  • packages/server/engine-lib/rocketlib-python/lib/depends.py
  • scripts/check_gpu.py
  • test/testdata/tests/tasktest.py
  • testdata/images/generate_test_images.py
💤 Files with no reviewable changes (11)
  • nodes/test/mocks/weaviate/classes/query.py
  • nodes/test/mocks/chromadb/config.py
  • packages/ai/src/ai/common/models/gliner/init.py
  • nodes/test/mocks/weaviate/collections/collection.py
  • nodes/test/mocks/weaviate/collections/init.py
  • nodes/test/mocks/pinecone/grpc.py
  • nodes/test/mocks/weaviate/util.py
  • nodes/test/init.py
  • nodes/test/mocks/weaviate/client.py
  • nodes/test/mocks/weaviate/classes/init.py
  • nodes/test/framework/init.py

Comment on lines +169 to 170
if limit and len(results['ids']) >= limit:
break
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

limit=0 is handled incorrectly in get().

Using if limit and ... makes limit=0 behave like “no limit”, which can return all rows instead of none.

Proposed fix
-            if limit and len(results['ids']) >= limit:
+            if limit is not None and len(results['ids']) >= limit:
                 break
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if limit and len(results['ids']) >= limit:
break
if limit is not None and len(results['ids']) >= limit:
break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/test/mocks/chromadb/__init__.py` around lines 169 - 170, In get() the
check "if limit and len(results['ids']) >= limit" treats limit=0 as falsy (no
limit) and returns all rows; update get() to explicitly handle limit being
provided: if limit == 0 return an empty result immediately, otherwise use "if
limit is not None and len(results['ids']) >= limit: break" (or equivalent) so
zero is respected and only None means unlimited. Reference the get() function
and the results['ids'] accumulation when making the change.

Comment on lines 267 to 270
# Sort by distance (lower is better for cosine distance)
results.sort(key=lambda x: x.metadata.distance if x.metadata.distance else 1.0)

return QueryResult(objects=results[:limit])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Sort by distance without collapsing 0.0 to 1.0.

0.0 is a valid cosine distance for an exact match, but this sort key treats it as falsy and ranks it like a miss. That can mis-order nearest-neighbor results in tests.

Proposed fix
-        results.sort(key=lambda x: x.metadata.distance if x.metadata.distance else 1.0)
+        results.sort(
+            key=lambda x: x.metadata.distance if x.metadata.distance is not None else 1.0
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/test/mocks/weaviate/__init__.py` around lines 267 - 270, Sorting uses a
truthy check that collapses 0.0 into the fallback 1.0; update the sort key used
on results (the lambda passed to results.sort in the QueryResult construction)
to explicitly check for None (e.g., use x.metadata.distance if
x.metadata.distance is not None else 1.0) so that valid 0.0 distances are
preserved and only missing distances get the high default.

@TrishaReddygari
Copy link
Copy Markdown
Author

Thanks @coderabbitai — all three findings look real, but they're pre-existing in the test mocks rather than introduced by this format-only PR. To keep this PR's scope mechanical, I've filed them as #747 and opened a follow-up PR #748 that fixes all three with one commit per file.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 3, 2026

Tip

For best results, initiate chat on the files or code changes.

@TrishaReddygari That makes complete sense — keeping a mechanical formatting PR free of unrelated fixes is good practice. Glad the findings are tracked in #747 and addressed in #748. I'll note this for future reviews so I don't re-flag these same pre-existing issues.

ʕ •ᴥ•ʔ ~ makes a mental note


✏️ Learnings added
Learnt from: TrishaReddygari
URL: https://github.com/rocketride-org/rocketride-server/pull/744

Timestamp: 2026-05-03T09:29:21.868Z
Learning: In the rocketride-server repository, findings in test mock files (e.g. nodes/test/mocks/) that are pre-existing and not introduced by a given PR should not be flagged in format-only/mechanical PRs. Pre-existing issues in test mocks are tracked separately (e.g. issue `#747`) and fixed in dedicated follow-up PRs.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@stepmikhaylov
Copy link
Copy Markdown
Collaborator

done /w #722/#723

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

builder ci/cd CI/CD and build system docs Documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Run ruff on all Python files and enforce formatting in CI

2 participants