Skip to content

Conversation

killme2008
Copy link
Contributor

@killme2008 killme2008 commented Oct 17, 2025

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

#1168

What's changed and what's your intention?

Main changes:

  • Migrated type and related function tests from DuckDB.
  • Implemented format(fmt, arg1, arg2...) and regex_extract functions.
  • Added Date64 support for date_format.

Remaining issues:

  1. DataFusion does not correctly support sampling: Support data source sampling with TABLESAMPLE apache/datafusion#13563
  2. Subqueries are not supported, for example:
SELECT a, b,
    a - (SELECT MIN(a) FROM test WHERE a IS NOT NULL) AS diff_from_min
FROM test
WHERE a IS NOT NULL
ORDER BY a - (SELECT MIN(a) FROM test WHERE a IS NOT NULL);

Error: 1001 (Unsupported), This feature is not implemented: Physical plan does not support logical expression ScalarSubquery(<subquery>)

TODO:

  • Contribute format(fmt, arg1, arg2...) and regex_extract functions to Datafusion and remove our custom implementations.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@killme2008 killme2008 requested review from a team, evenyag, v0y4g3r and waynexia as code owners October 17, 2025 08:07
@github-actions github-actions bot added size/XXL docs-not-required This change does not impact docs. labels Oct 17, 2025
WHERE a IS NOT NULL
ORDER BY a - (SELECT MIN(a) FROM test WHERE a IS NOT NULL);

Error: 1001(Unsupported), This feature is not implemented: Physical plan does not support logical expression ScalarSubquery(<subquery>)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's known issue.

@killme2008 killme2008 requested a review from Copilot October 17, 2025 08:08
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds enhanced string function support and comprehensive type testing to GreptimeDB. The primary purpose is to implement the FORMAT function with escape character support, the REGEXP_EXTRACT function for regex-based text extraction, and expand test coverage for various data types including Unicode strings, floating-point edge cases, and SQL operations.

Key Changes:

  • Implementation of FORMAT and REGEXP_EXTRACT string functions with proper escaping and regex support
  • Addition of comprehensive test suites for string functions, Unicode handling, NaN/infinity arithmetic, and SQL operations
  • Upgrade of regex dependency to version 1.12 for improved functionality

Reviewed Changes

Copilot reviewed 53 out of 54 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/common/function/src/scalars/string/* New string function implementations including format and regex_extract
tests/cases/standalone/common/function/string/* Comprehensive string function test suites
tests/cases/standalone/common/types/float/* Floating-point type tests covering NaN, infinity, and IEEE compliance
tests/cases/standalone/common/types/string/* Unicode and large string handling tests
tests/cases/standalone/common/order/* ORDER BY functionality tests
tests/cases/standalone/common/sample/* Table sampling tests
Cargo.toml and related Regex dependency upgrade to version 1.12

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Signed-off-by: Dennis Zhuang <[email protected]>
@github-actions github-actions bot added docs-required This change requires docs update. and removed docs-not-required This change does not impact docs. labels Oct 17, 2025
@github-actions github-actions bot added docs-not-required This change does not impact docs. and removed docs-required This change requires docs update. labels Oct 17, 2025
Signed-off-by: Dennis Zhuang <[email protected]>
Signed-off-by: Dennis Zhuang <[email protected]>
Comment on lines 174 to 183
if let Some(name) = name_opt
&& needed_names.contains(name)
{
if i + 1 >= values.len() {
return Err(DataFusionError::Execution(format!(
"FORMAT: named argument '{}' has no corresponding value",
name
)));
}
named_map.insert(name.clone(), &values[i + 1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the positional value has the same value as the needed_names? We may not be able to distinguish that because the needed_names only contains the name, without the position of the format name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! We should require the name parameter to appear at the end of the parameter list, similar to Python or Rust macros.

Let me fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evenyag I removed the format function for now, as it’s not the right time to add it before we consider it carefully. Let’s remove it temporarily.

PTAL.

Signed-off-by: Dennis Zhuang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants