Skip to content

Spark SQL time dimensions use TRUNC(..., 'DAY'), which returns NULL #140

@vsevolodbazhan

Description

@vsevolodbazhan

Problem

When using Sidemantic with Spark SQL / HiveServer2, time dimensions with day granularity are compiled with TRUNC(column, 'DAY').

Example semantic model:

models:
  - name: bookings
    sql: |
      SELECT *
      FROM integrations.bookings
      WHERE pdate > '2026-01-01'
    primary_key: booking_id
    dimensions:
      - name: state
        type: categorical
      - name: created_at
        type: time
        granularity: day
    metrics:
      - name: bookings_count
        agg: count
      - name: paid_bookings_count
        agg: count
        filters:
          - state = 'paid'

A query such as:

select bookings.created_at, bookings.paid_bookings_count
from bookings
where bookings.created_at > '2026-05-01'

generates SQL similar to:

WITH bookings_cte AS (
  SELECT
    booking_id AS booking_id,
    state AS state,
    TRUNC(created_at, 'DAY') AS created_at,
    CASE WHEN state = 'paid' THEN 1 ELSE NULL END AS paid_bookings_count_raw
  FROM integrations.bookings
  WHERE created_at > '2026-05-01'
)
SELECT
  bookings_cte.created_at AS created_at,
  COUNT(bookings_cte.paid_bookings_count_raw) AS paid_bookings_count
FROM bookings_cte
GROUP BY
  1

In Spark SQL, this does not work as expected. TRUNC(timestamp_col, 'DAY') returns NULL.

Expected behavior

For Spark SQL, day-level timestamp truncation should use DATE_TRUNC, for example:

DATE_TRUNC('DAY', created_at)

Why it matters

Spark’s TRUNC function only supports date truncation to larger calendar units such as year, month, week, and quarter. It does not support DAY as a format for TRUNC.

Spark’s DATE_TRUNC function is the appropriate function for timestamp truncation to day/hour/minute/etc.

Because of this, Sidemantic-generated SQL for type: time dimensions with granularity: day currently produces unusable results on Spark: the dimension becomes NULL and grouping breaks.

Sources

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions