Skip to content

Conversation

kosiew
Copy link
Contributor

@kosiew kosiew commented Oct 12, 2025

Which issue does this PR close?

Closes #17998


Rationale for this change

Currently, to_timestamp() and its precision variants (to_timestamp_seconds, to_timestamp_millis, to_timestamp_micros, to_timestamp_nanos) ignore the execution timezone provided in ScalarFunctionArgs.config_options.
As a result, timestamp strings without explicit timezone information are always interpreted as UTC, regardless of the session’s datafusion.execution.time_zone setting.

This behavior is incorrect and can lead to inconsistent or unexpected timestamp conversions when users rely on a configured execution timezone.

This PR ensures that all to_timestamp* functions correctly respect and apply the configured execution timezone for naïve (timezone-free) datetime strings.


What changes are included in this PR?

  • Introduced a new utility type:

    • ConfiguredTimeZone, encapsulating both named timezones (via arrow::array::timezone::Tz) and fixed offsets (FixedOffset).
  • Added timezone parsing helpers:

    • ConfiguredTimeZone::parse() to resolve IANA names or ±HH:MM offsets.
    • parse_fixed_offset() to safely interpret offset strings.
  • Updated all to_timestamp* implementations to:

    • Extract the execution timezone from config_options.execution.time_zone.
    • Use string_to_timestamp_nanos_with_timezone() or
      string_to_timestamp_nanos_formatted_with_timezone() when parsing naïve datetime strings.
  • Added robust conversion helpers for naïve vs. localized datetimes:

    • timestamp_to_naive, datetime_to_timestamp, and
      local_datetime_to_timestamp (handling ambiguous/invalid times).
  • Comprehensive test coverage:

    • to_timestamp_respects_execution_timezone verifies that configured offsets shift timestamps as expected.
    • to_timestamp_formats_respect_timezone ensures format-based parsing respects named zones (e.g., "Asia/Tokyo").

Are these changes tested?

Yes.
Two new unit tests were added:

  • to_timestamp_respects_execution_timezone
  • to_timestamp_formats_respect_timezone

Existing to_timestamp_* tests also pass with the new timezone logic.


Are there any user-facing changes?

Yes — intended behavioral improvement:

  • Naïve datetime strings passed to to_timestamp() and its variants will now be interpreted relative to the configured execution timezone (from datafusion.execution.time_zone), rather than defaulting to UTC.

There are no breaking API changes, only corrected behavior aligning with user expectations.

Implement timezone-aware handling for to_timestamp functions by
adding the ConfiguredTimeZone utilities. Refactor shared helpers
to ensure naïve strings are interpreted using the configured
execution zone. Extend unit tests to cover naïve and formatted
inputs respecting non-UTC execution timezones.
Deferred execution of timezone parsing ensures that the
configured timezone string is only interpreted when dealing
with UTF-8 inputs. This change keeps numeric arguments
unaffected by any invalid session timezone values.
@github-actions github-actions bot added the functions Changes to functions implementation label Oct 12, 2025
@kosiew kosiew marked this pull request as ready for review October 12, 2025 12:04
@Weijun-H
Copy link
Member

I ran a simple test, but it appears this PR doesn't resolve the issue:

> SET TIMEZONE="+05";
0 row(s) fetched. 
Elapsed 0.001 seconds.

> SELECT arrow_typeof(to_timestamp('2023-01-31T09:26:56.123456789'));
+-------------------------------------------------------------------+
| arrow_typeof(to_timestamp(Utf8("2023-01-31T09:26:56.123456789"))) |
+-------------------------------------------------------------------+
| Timestamp(Nanosecond, None)                                       |
+-------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.006 seconds.

The timezone is still None despite setting TIMEZONE='+05'.

@kosiew
Copy link
Contributor Author

kosiew commented Oct 14, 2025

Pausing this to continue after #18017 to see how we approach modifying return_field_from_args

Modifying return_field_from_args() to incorporate ConfigOptions would require extensive changes across multiple components with questionable return on investment. Alternatives such as special casing or schema deferral could be considered, but their effectiveness is uncertain.

@Omega359
Copy link
Contributor

Thank you. I'm leaning towards handling the return_field_from_args issue via a new constructor or some other mechanism rather than updating the args to that function but we'll have to fix that first I think before PR's like this can move forward.


fn parse_fixed_offset(tz: &str) -> Option<FixedOffset> {
let tz = tz.trim();
if tz.eq_ignore_ascii_case("utc") || tz.eq_ignore_ascii_case("z") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tz::from_str(tz) doesn't account for these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked to confirm that it does not handle lower case:

    fn parse_fixed_offset_accepts_lowercase_and_z() -> Result<()> {
        use std::str::FromStr;

        assert!(!Tz::from_str("utc").is_err());
        ConfiguredTimeZone::parse("utc")?; // succeeds via parse_fixed_offset fallback
        ConfiguredTimeZone::parse("Z")?; // succeeds via parse_fixed_offset fallback
        Ok(())
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. While I expect it's to spec I can't see a reason why that impl shouldn't be case insensitive. I'll file a ticket for that as well.

}
}

fn parse_fixed_offset(tz: &str) -> Option<FixedOffset> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is a private fn

error[E0603]: function `parse_fixed_offset` is private
  --> datafusion/functions/src/datetime/common.rs:18:29
   |
18 | use arrow::array::timezone::parse_fixed_offset;
   |                             ^^^^^^^^^^^^^^^^^^ private function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll file a ticket to make that pub.


pub(crate) fn parse(tz: &str) -> Result<Self> {
if tz.trim().is_empty() {
return Ok(Self::utc());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we may want to change this to allow for None - see the start of discussion @ #17993 (comment) and #18017 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make to_timestamp aware of execution timezone

3 participants