Skip to content

Conversation

@renecannao
Copy link
Contributor

@renecannao renecannao commented Jan 7, 2026

Summary

Fixes a parsing error in the MySQL SET statement parser that occurred when processing SET time_zone statements with:

  1. Three-component IANA timezone names (e.g., America/Argentina/Buenos_Aires, America/Indiana/Indianapolis)
  2. Timezone names containing hyphens (e.g., America/Port-au-Prince, America/Blanc-Sablon)

Previously, the regex pattern (?:\w+/\w+) only matched 2-component timezone names and did not support hyphens. This caused parsing errors in the logs:

[ERROR] Unable to parse query. If correct, report it as a bug: 
SET time_zone="America/Argentina/Buenos_Aires";

When multiplexing is enabled, this bug causes timestamps to be incorrectly written to the database.

Changes

Core Fix

  • File: lib/MySQL_Set_Stmt_Parser.cpp:298
  • Old regex: (?:\w+/\w+) - matches only 2 components, no hyphens
  • New regex: (?:[\w-]+(?:/[\w-]+){1,2}) - matches 2-3 components with hyphens

Documentation

  • Added comprehensive Doxygen documentation for the timezone parsing code
  • Documents both numeric offset format (+08:00) and IANA timezone name format
  • Clearly explains the 2-3 component limitation and character set support
  • References MySQL and IANA timezone documentation

Tests

  • Extended time_zone test array in test/tap/tests/setparser_test_common.h
  • Added test cases for:
    • 3-component timezone names (the bug fix)
    • Timezone names with hyphens (additional improvement)
    • UTC and SYSTEM special values (already worked, now explicitly tested)
    • Various numeric offsets

Examples of Supported Timezones

Format Examples
Numeric offset +08:00, -05:30, +00:00
2 components Europe/London, America/New_York, Asia/Tokyo
3 components America/Argentina/Buenos_Aires, America/Indiana/Indianapolis
With hyphens America/Port-au-Prince, America/Blanc-Sablon
Special values SYSTEM, UTC

Addresses Review Feedback

This PR incorporates feedback from gemini-code-assist on the original #4993:

  • ✅ Fixed the 3-component timezone name parsing issue
  • ✅ Added support for hyphens in timezone names (e.g., America/Port-au-Prince)
  • ✅ Updated code comments to reflect the new pattern capabilities
  • ✅ Added comprehensive Doxygen documentation

Limitations

The regex pattern limits matching to 2-3 components (e.g., Area/Location or Area/Country/Location). While IANA timezone names with 4+ components are theoretically possible, they are extremely rare and not currently supported. This is documented in the code.

Related

  • Fixes Correct regex for timezone #4993
  • Addresses gemini-code-assist review comments on the original PR
  • Fresh start on v3.0 branch with comprehensive documentation

Credit to @pbrydzinski for the original PR

This commit fixes a parsing error in the MySQL SET statement parser that
occurred when processing `SET time_zone` statements with:

1. Three-component IANA timezone names (e.g., America/Argentina/Buenos_Aires)
2. Timezone names containing hyphens (e.g., America/Port-au-Prince)

Previously, the regex pattern `(?:\w+/\w+)` only matched 2-component
timezone names and did not support hyphens. This caused parsing errors
logged as:
  "[ERROR] Unable to parse query. If correct, report it as a bug:
   SET time_zone=\"America/Argentina/Buenos_Aires\";"

When multiplexing is enabled, this bug causes timestamps to be incorrectly
written to the database.

Changes:
- Updated timezone regex from `(?:\w+/\w+)` to `(?:[\w-]+(?:/[\w-]+){1,2})`
  - Supports 2-3 components: Area/Location or Area/Country/Location
  - Supports hyphens in component names (e.g., Port-au-Prince)
- Added comprehensive Doxygen documentation for timezone parsing
- Extended TAP test cases with new timezone formats

Note: Bare words like 'SYSTEM' and 'UTC' were already supported via
other patterns in the parser (vp2 pattern for word matching).

Fixes: #4993
Related: gemini-code-assist review comments
@gemini-code-assist
Copy link

Summary of Changes

Hello @renecannao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical parsing bug in the MySQL SET time_zone statement, which previously failed to recognize IANA timezone names containing three components or hyphens. The updated regex pattern now correctly processes these formats, preventing data inconsistencies when multiplexing is active. The changes are supported by new, comprehensive Doxygen documentation explaining the parsing logic and expanded test cases to ensure robustness.

Highlights

  • Enhanced Timezone Parsing: The MySQL SET time_zone statement parser now correctly handles IANA timezone names with three components (e.g., "America/Argentina/Buenos_Aires") and those containing hyphens (e.g., "America/Port-au-Prince").
  • Regex Update: The core fix involves updating the regex pattern in lib/MySQL_Set_Stmt_Parser.cpp from "(?:\w+/\w+)" to "(?:[\w-]+(?:/[\w-]+){1,2})" to accommodate these new formats.
  • Comprehensive Documentation: Extensive Doxygen documentation has been added to lib/MySQL_Set_Stmt_Parser.cpp, detailing the numeric offset and IANA timezone name formats, including component structure, examples, special values, and current 2-3 component limitations.
  • Expanded Test Coverage: The time_zone test array in test/tap/tests/setparser_test_common.h has been significantly extended with new test cases for 3-component names, hyphenated names, UTC, SYSTEM, and various numeric offsets.
  • Addressed Review Feedback: This PR directly incorporates feedback from a previous review, specifically fixing the 3-component parsing issue and adding hyphen support.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent pull request. The fix to the timezone parsing regex is correct and effectively resolves the issue with 3-component IANA names and names containing hyphens. The addition of comprehensive Doxygen documentation is a fantastic improvement for maintainability, and the new test cases are thorough, ensuring the fix is well-covered. I have one minor suggestion to improve the clarity of one of the new test cases.

// Timezone names with hyphens (additional fix)
{ "SET time_zone = 'America/Port-au-Prince'", { Expected("time_zone", {"America/Port-au-Prince"}) } },
{ "SET time_zone = 'America/Blanc-Sablon'", { Expected("time_zone", {"America/Blanc-Sablon"}) } },
{ "SET time_zone = \"Atlantic/Canary\"", { Expected("time_zone", {"Atlantic/Canary"}) } },

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test case for Atlantic/Canary is valid, but it doesn't test the new hyphen support as the comment on line 156 suggests, since the name contains no hyphens and would have been matched by the old regex. To make this test more specific to the hyphen-support feature, consider replacing it with a name that does contain a hyphen, such as US/East-Indiana.

Suggested change
{ "SET time_zone = \"Atlantic/Canary\"", { Expected("time_zone", {"Atlantic/Canary"}) } },
{ "SET time_zone = \"US/East-Indiana\"", { Expected("time_zone", {"US/East-Indiana"}) } },

Replace 'Atlantic/Canary' with 'US/East-Indiana' in the hyphen test
cases. The previous test case (Atlantic/Canary) did not contain a
hyphen and would have been matched by the old regex, so it did not
actually test the new hyphen support.

The new test case (US/East-Indiana) contains a hyphen and properly
validates the hyphen support feature.

Thanks to @gemini-code-assist for the excellent review feedback!
@renecannao
Copy link
Contributor Author

@gemini-code-assist Thank you for the excellent review! I've addressed your feedback by replacing Atlantic/Canary with US/East-Indiana in the hyphen test cases.

You were absolutely right - Atlantic/Canary doesn't actually test the hyphen support since "Canary" contains no hyphens. The new test case US/East-Indiana properly validates that the regex handles hyphens in timezone names.

@gemini-code-assist
Copy link

You're very welcome, @renecannao! I appreciate you taking the time to update the test cases. Replacing Atlantic/Canary with US/East-Indiana is indeed a more accurate and effective way to validate the hyphen support in the regex. This change significantly strengthens the test coverage for this specific scenario. Excellent work!

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 7, 2026

@renecannao renecannao added this to the Release 3.0.5 milestone Jan 7, 2026
@renecannao
Copy link
Contributor Author

retest this please

@renecannao renecannao merged commit ef872c7 into v3.0 Jan 8, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants