Skip to content

Revival: Composable time-series primitives (post-index removal)#60

Merged
ezmiller merged 76 commits intomainfrom
ethan/reviving-post-tmd-index-removal
Feb 28, 2026
Merged

Revival: Composable time-series primitives (post-index removal)#60
ezmiller merged 76 commits intomainfrom
ethan/reviving-post-tmd-index-removal

Conversation

@kingkongbot
Copy link
Collaborator

Context

The indexing mechanism in tech.ml.dataset was removed as part of the v7 simplification effort (see 7.000-beta-5 changelog: "Integration of ham-fisted deeply into dtype-next, tmd, and tablecloth"). This broke the original tablecloth.time API which relied on that indexing.

The SciCloj community discussed approaches to temporal indexing in this Zulip thread. Key insight from Chris Nuernberger (dtype-next author):

"Just sorting the dataset and using binary search will outperform most/all tree structures in this scenario as it is faster to sort than to construct trees."

This PR

A rethink of tablecloth.time for a post-index world. Rather than reimplementing complex indexing, we provide composable primitives that work with explicit column arguments:

New API

Dataset-level (tablecloth.time.api):

  • add-time-columns — batch extract datetime fields (year, month, day-of-week, etc.)
  • slice — time-range selection using binary search
  • add-lag / add-lead — single lag/lead with auto-naming
  • add-lags / add-leads — batch lag/lead (vector or map form)

Column-level (tablecloth.time.column.api):

  • 21 functions: year, month, epoch-day, epoch-week, floor-to-*, down-to-nearest, etc.

Philosophy

  • Explicit over implicit — no metadata-based time index; pass column names directly
  • Composition over magic — use with standard tablecloth group-by, aggregate, etc.
  • Binary search over trees — sorting + binary search is fast and simple

Testing

  • 65 tests, 252 assertions, 0 failures
  • clj-kondo: 0 errors, 0 warnings

Documentation

  • Updated README with design philosophy and indexing rationale
  • doc/zulip-indexing-discussion-summary.md — full context on indexing decisions
  • notebooks/chapter_02_time_series_graphics.clj — fpp3 workthrough examples

Housekeeping

  • Removed _archive/ folder (old index-based code preserved in git history)
  • Bumped Clojure 1.10.2 → 1.12.0
  • Updated Clay + tableplot for notebooks

New fields:
- :hour-fractional — decimal hour (13.5 for 13:30)
- :daily-phase — position in day, 0→1
- :weekly-phase — position in week, 0→1
- :date-string — date as YYYY-MM-DD string
- :year-string, :month-string, :week-string, :day-of-week-string — for categorical colors

Updated notebook to use new fields instead of manual computation.
- :week-index — continuous week (0-52), avoids ISO boundary issues
- :year-week-string — 'YYYY-Www' format for weekly seasonal grouping
- doc/design-decisions.md — rationale for composable helpers vs metadata
- Updated notebook to use all new fields (no more manual tc/add-column)
- Move require into ns form (was standalone)
- Add tcc and dtype requires
- Use dtype/emap for string conversions (faster, proper column type)
- Keep mapv only where Java interop or formatting needed
- Add :yearly-phase field
- All string functions now return proper columns
Following the existing pattern of local-date->epoch-month/year/quarter:
- Added local-date->epoch-day and local-date->epoch-week (private)
- Added public epoch-day and epoch-week functions to column.api
- Added :epoch-day and :epoch-week to add-time-columns extractors

Note: week-index in api.clj is different from epoch-week:
- week-index = (day-of-year - 1) / 7 — position within year (for seasonal plots)
- epoch-week = days-since-epoch / 7 — continuous across years
- Renamed to week-of-year-index for clarity
- Moved implementation to column.api built on epoch-week
- Computed as: epoch-week(date) - epoch-week(Jan 1 of same year)
- Properly handles year boundaries (Dec 31 = week 52, not ISO week 1)
- Updated notebook and year-week-string to use new function
- Use tc/mean directly instead of dfn/mean with custom function
- Add daily demand plot showing the result
- Emphasize composability with standard tablecloth ops
The previous example cheated by using the pre-existing Date column.
Now properly demonstrates extracting date from datetime via
add-time-columns, then grouping with standard tablecloth.
Legacy index-based code is preserved in git history if needed.
Cleaning up for merge to main.
- Prefix unused ds param with _ in rolling.clj stub
- Remove unused tablecloth.api require from column/api.clj
- Fix malformed .clj-kondo/config.edn
Column-level functions now have explicit test coverage:
- epoch-day: days since 1970-01-01, including negative
- epoch-week: weeks since epoch, year boundary handling
- week-of-year-index: 0-based week within year, resets at boundary
- lag: shift forward with nil at start
- lead: shift backward with nil at end

Test count: 60 -> 65, assertions: 232 -> 252
- Links to Zulip discussion where the decision was made
- Clarifies this doesn't preclude future indexing
- Notes SciCloj is experimenting with index-free approaches
- References the full discussion summary in doc/
@ezmiller ezmiller merged commit 310ea81 into main Feb 28, 2026
2 checks passed
@ezmiller ezmiller deleted the ethan/reviving-post-tmd-index-removal branch February 28, 2026 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants