chore(wren-ai-service): fix eval issues #1987

cyyeh · 2025-10-07T02:34:16Z

Summary by CodeRabbit

Bug Fixes
- Improved SQL handling: backticks are normalized to double quotes, quoting errors are surfaced with clearer messages, and DB name parsing is more robust.
- Ensured eval data auto-loads for “spider_” datasets, aligning behavior with other datasets.
Refactor
- Minor non-functional cleanups (argument naming, formatting, and logging clarity).
Chores
- Updated internal component versions to latest releases.

coderabbitai · 2025-10-07T02:34:41Z

Walkthrough

Refactors SQL quoting/error handling across utils and engine by normalizing backticks to double quotes, improving error propagation/logging, and adjusting parameter passing. Extends prediction flow to load Spider datasets into Postgres. Updates environment version pins. Minor formatting and comment removals without functional impact.

Changes

Cohort / File(s)	Summary
SQL quoting and analysis `wren-ai-service/src/core/engine.py`, `wren-ai-service/eval/utils.py`, `wren-ai-service/eval/data_curation/utils.py`	Engine add_quotes now replaces backticks with double quotes before transpile; utils capture and propagate quoting errors with details, pass timeout as keyword, and make parse_db_name more robust; is_sql_valid treats quoting failures as errors and asserts accordingly.
Evaluation flow adjustments `wren-ai-service/eval/prediction.py`, `wren-ai-service/eval/preparation.py`	prediction: for "spider_" datasets, also loads eval DB into Postgres; preparation: passes api_endpoint as a named argument to get_contexts_from_sql; removed dead comments and minor formatting tweaks.
Dev environment versions `wren-ai-service/tools/dev/.env`	Bumped component versions: WREN_ENGINE 0.17.1→0.20.2, WREN_AI_SERVICE 0.24.3→0.27.14, IBIS_SERVER 0.17.1→0.20.2, WREN_UI 0.30.0→0.31.2.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User/Runner
  participant P as eval/prediction.py
  participant FS as Filesystem
  participant PG as Postgres

  U->>P: run prediction(db_name, settings)
  P->>FS: set eval_data_db_path
  alt db_name starts with "spider_"
    P->>PG: load_eval_data_db_to_postgres(db_name, path)
    PG-->>P: load complete
  else db_name starts with "bird_"
    P->>PG: load_eval_data_db_to_postgres(db_name, path)
    PG-->>P: load complete
  end
  P-->>U: continue prediction pipeline

sequenceDiagram
  autonumber
  participant C as Caller (utils/prep)
  participant E as engine.add_quotes
  participant G as sqlglot
  note over E: Preprocess: replace ` with "
  C->>E: add_quotes(sql)
  E->>G: transpile(sql_normalized)
  alt success
    G-->>E: quoted_sql
    E-->>C: quoted_sql, no error
  else error
    G-->>E: exception
    E-->>C: original_sql, error
    note over C: Log error with details, assert/fallback paths
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

chore(wren-ai-service): minor updates #1782: Also modifies add_quotes behavior (dialect handling) alongside current normalization changes.
chore(wren-ai-service): improve text2sql #1855: Adjusts add_quotes transpile error handling, intersecting with this PR’s error propagation.
chore(wren-ai-service): improve add quotes #1913: Touches the same add_quotes implementation and related call sites.

Suggested labels

module/ai-service, ai-env-changed

Suggested reviewers

yichieh-lu
imAsterSun

Poem

I nibbled quotes, swapped ``` for "—so neat!
Hopped through Spider’s paths with Postgres treats.
Logs now squeak the errors clear,
Versions fresh like springtime cheer.
Thump-thump, tests align in tune—
Carrot commits will ship soon! 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title Check	❓ Inconclusive	The title “chore(wren-ai-service): fix eval issues” is overly generic and does not clearly communicate the primary change or the most significant fix in the changeset, making it difficult for reviewers to understand the main purpose at a glance.	Please update the title to specify the key fix or refactor— for example, “fix SQL quoting errors in eval service” or “add missing eval data loading for spider_ paths”—so that it clearly conveys the main change.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chore/ai-service/fix-eval

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 14e389f and 0c19010.

📒 Files selected for processing (6)

wren-ai-service/eval/data_curation/utils.py (1 hunks)
wren-ai-service/eval/prediction.py (1 hunks)
wren-ai-service/eval/preparation.py (2 hunks)
wren-ai-service/eval/utils.py (4 hunks)
wren-ai-service/src/core/engine.py (1 hunks)
wren-ai-service/tools/dev/.env (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

wren-ai-service/eval/data_curation/utils.py (1)

wren-ai-service/src/core/engine.py (1)

add_quotes (52-67)

wren-ai-service/eval/prediction.py (1)

wren-ai-service/eval/utils.py (1)

load_eval_data_db_to_postgres (567-595)

wren-ai-service/eval/utils.py (1)

wren-ai-service/src/core/engine.py (1)

add_quotes (52-67)

🪛 dotenv-linter (3.3.0)

wren-ai-service/tools/dev/.env

[warning] 14-14: [UnorderedKey] The WREN_ENGINE_VERSION key should go before the WREN_PRODUCT_VERSION key

(UnorderedKey)

[warning] 15-15: [UnorderedKey] The WREN_AI_SERVICE_VERSION key should go before the WREN_ENGINE_VERSION key

(UnorderedKey)

[warning] 16-16: [UnorderedKey] The IBIS_SERVER_VERSION key should go before the WREN_AI_SERVICE_VERSION key

(UnorderedKey)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Analyze (go)

🔇 Additional comments (9)

wren-ai-service/eval/prediction.py (1)

110-112: LGTM! Consistent data loading across paths.

The addition of load_eval_data_db_to_postgres in the spider_ branch now mirrors the existing bird_ branch behavior, ensuring consistent evaluation data handling across both dataset types.

wren-ai-service/src/core/engine.py (1)

54-54: LGTM! Backtick normalization improves SQL compatibility.

Pre-processing backticks to double quotes before transpilation handles SQL dialects that use backticks for identifiers, improving robustness of the quoting logic.

wren-ai-service/tools/dev/.env (1)

14-17: LGTM! Version updates applied.

The version bumps for WREN_ENGINE, WREN_AI_SERVICE, IBIS_SERVER, and WREN_UI are straightforward configuration changes.

Note: The static analysis tool flags key ordering, but this is a stylistic concern and does not affect functionality.

wren-ai-service/eval/data_curation/utils.py (1)

47-48: LGTM! Improved error variable naming and messaging.

The change from no_error (boolean-like semantics) to error (actual error string) makes the code more explicit. Including the error details in the assertion message aids debugging.

wren-ai-service/eval/preparation.py (1)

413-413: LGTM! Keyword argument improves clarity.

Using api_endpoint=WREN_ENGINE_API_URL as a keyword argument makes the function call more explicit and easier to understand.

wren-ai-service/eval/utils.py (4)

37-37: LGTM! Enhanced error reporting.

Including the error details in the assertion message improves debuggability when SQL quoting fails.

160-163: LGTM! Improved error handling with context.

The refactored error handling now:

Captures the actual error message instead of just a boolean flag

Logs the error with SQL context before falling back

Provides better diagnostics for SQL quoting failures

178-180: LGTM! Explicit keyword argument.

Using timeout=timeout as a keyword argument improves code clarity and aligns with the pattern used elsewhere in the codebase.

195-195: LGTM! Pythonic boolean expression.

Using or operator is more concise and idiomatic than the ternary conditional for this use case.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cyyeh added 5 commits October 3, 2025 11:56

fix bug

2ed4fd8

update

e0b978a

update

c06c968

update

cf4ef3a

fix eval issues

0c19010

cyyeh requested a review from yichieh-lu October 7, 2025 02:34

github-actions bot added the wren-ai-service label Oct 7, 2025

Merge branch 'main' into chore/ai-service/fix-eval

224fa13

yichieh-lu approved these changes Oct 7, 2025

View reviewed changes

cyyeh merged commit 9e06d06 into main Oct 7, 2025
11 checks passed

cyyeh deleted the chore/ai-service/fix-eval branch October 7, 2025 02:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(wren-ai-service): fix eval issues #1987

chore(wren-ai-service): fix eval issues #1987

Uh oh!

cyyeh commented Oct 7, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore(wren-ai-service): fix eval issues #1987

chore(wren-ai-service): fix eval issues #1987

Uh oh!

Conversation

cyyeh commented Oct 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cyyeh commented Oct 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 7, 2025 •

edited

Loading