Revival: Composable time-series primitives (post-index removal)#60
Merged
Revival: Composable time-series primitives (post-index removal)#60
Conversation
Not sure if we'll keep this.
New fields: - :hour-fractional — decimal hour (13.5 for 13:30) - :daily-phase — position in day, 0→1 - :weekly-phase — position in week, 0→1 - :date-string — date as YYYY-MM-DD string - :year-string, :month-string, :week-string, :day-of-week-string — for categorical colors Updated notebook to use new fields instead of manual computation.
- :week-index — continuous week (0-52), avoids ISO boundary issues - :year-week-string — 'YYYY-Www' format for weekly seasonal grouping - doc/design-decisions.md — rationale for composable helpers vs metadata - Updated notebook to use all new fields (no more manual tc/add-column)
- Move require into ns form (was standalone) - Add tcc and dtype requires - Use dtype/emap for string conversions (faster, proper column type) - Keep mapv only where Java interop or formatting needed - Add :yearly-phase field - All string functions now return proper columns
Following the existing pattern of local-date->epoch-month/year/quarter: - Added local-date->epoch-day and local-date->epoch-week (private) - Added public epoch-day and epoch-week functions to column.api - Added :epoch-day and :epoch-week to add-time-columns extractors Note: week-index in api.clj is different from epoch-week: - week-index = (day-of-year - 1) / 7 — position within year (for seasonal plots) - epoch-week = days-since-epoch / 7 — continuous across years
- Renamed to week-of-year-index for clarity - Moved implementation to column.api built on epoch-week - Computed as: epoch-week(date) - epoch-week(Jan 1 of same year) - Properly handles year boundaries (Dec 31 = week 52, not ISO week 1) - Updated notebook and year-week-string to use new function
- Use tc/mean directly instead of dfn/mean with custom function - Add daily demand plot showing the result - Emphasize composability with standard tablecloth ops
The previous example cheated by using the pre-existing Date column. Now properly demonstrates extracting date from datetime via add-time-columns, then grouping with standard tablecloth.
Legacy index-based code is preserved in git history if needed. Cleaning up for merge to main.
- Prefix unused ds param with _ in rolling.clj stub - Remove unused tablecloth.api require from column/api.clj - Fix malformed .clj-kondo/config.edn
Column-level functions now have explicit test coverage: - epoch-day: days since 1970-01-01, including negative - epoch-week: weeks since epoch, year boundary handling - week-of-year-index: 0-based week within year, resets at boundary - lag: shift forward with nil at start - lead: shift backward with nil at end Test count: 60 -> 65, assertions: 232 -> 252
- Links to Zulip discussion where the decision was made - Clarifies this doesn't preclude future indexing - Notes SciCloj is experimenting with index-free approaches - References the full discussion summary in doc/
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
The indexing mechanism in
tech.ml.datasetwas removed as part of the v7 simplification effort (see 7.000-beta-5 changelog: "Integration of ham-fisted deeply into dtype-next, tmd, and tablecloth"). This broke the original tablecloth.time API which relied on that indexing.The SciCloj community discussed approaches to temporal indexing in this Zulip thread. Key insight from Chris Nuernberger (dtype-next author):
This PR
A rethink of tablecloth.time for a post-index world. Rather than reimplementing complex indexing, we provide composable primitives that work with explicit column arguments:
New API
Dataset-level (
tablecloth.time.api):add-time-columns— batch extract datetime fields (year, month, day-of-week, etc.)slice— time-range selection using binary searchadd-lag/add-lead— single lag/lead with auto-namingadd-lags/add-leads— batch lag/lead (vector or map form)Column-level (
tablecloth.time.column.api):year,month,epoch-day,epoch-week,floor-to-*,down-to-nearest, etc.Philosophy
group-by,aggregate, etc.Testing
Documentation
doc/zulip-indexing-discussion-summary.md— full context on indexing decisionsnotebooks/chapter_02_time_series_graphics.clj— fpp3 workthrough examplesHousekeeping
_archive/folder (old index-based code preserved in git history)