-
Notifications
You must be signed in to change notification settings - Fork 538
Time date processing for riak_ts
There are several overlapping tasks that arise from recent discussions regarding desired time features. These include:
- Allow users to enter
ISO 8601 time/date strings
(including timezones) in input and query via
riak-shell
.- Dates will be converted to UNIX epoch for storage.
- If a timezone is not supplied, a decision must be made: reject,
assume UTC, or assume system default timezone.
* Complication: what if the system where
riak-shell
is running has a different default timezone than Riak itself? - Leap seconds make queries much trickier since the same UNIX timestamp occurs on two different days.
- A failure by the operators to maintain updated system timezone files would potentially result in unexpected behavior.
- Allow users to use time operations in queries (see Postgres' doc for examples).
- Display resulting timestamps in
riak-shell
as ISO 8601 format.- The output could be UTC, the system's timezone, drawn from the table definition, or even specified as part of the query.
- Allow operators to specify alternative timezones as the default for a table.
- Impacts quantum boundaries.
- Impacts output from
riak-shell
if timestamps are processed into ISO 8601. - Impacts default timezone on input and query dates parsed via
riak-shell
. - Would not impact storage: dates would still be stored as UTC.
- A failure by the operators to maintain updated system timezone files would potentially result in unexpected behavior.
- Allow operators to specify explicit integer values as a quantum function.
- Allows the database to have any zero point that makes sense for the application.
- Makes it easier to semantically treat the values as something other than time values (e.g. geolocation).
- Allow operators to define the meaning of a timestamp increment.
- Removes the assumption that all time values are in milliseconds.
- Might, for simplicity, require an integer-based quantum function.
- Allow clients to determine any above table-specific time-related metadata for display purposes.
Also, there's clearly a significant documentation task: we need to make certain the current assumptions and implications are captured for end users, along with the impact of changes from the above tasks. We have made the implementation almost irrelevant to end users, but many of them will still wish to understand how this fits together.
It is also important to document that use of the riak-shell
translations to and from user-friendly time strings will impact
performance.
Currently, users must enter integer values for times, assumed for quanta calculations to be UNIX epoch values in UTC.
- Client libraries are free to accept any time/date format and process
them into UNIX epoch time, much like
riak-shell
could do. - The timestamp field in the database is a 64-bit unsigned integer.
- The epoch for UNIX time is midnight January 1, 1970 UTC (effectively
GMT).
- The specification of a DDL quantum function in terms of days will result in quanta boundaries at midnight UTC.
- Timezones with partial offsets from UTC (e.g. IST) will have quanta boundaries that do not fall on the local hour.
- The time values are assumed to be in milliseconds, which impacts all quanta calculations.
- The system clock and default timezone is currently completely irrelevant. All time values are calculated (quantum boundaries) or specified as input by the user/application.
- Leap seconds are silently swallowed into the timestamp.
- If a second is added at the end of a day, both that second and the one that follows are represented by a single timestamp. Wikipedia
Other than determining quantum boundaries, the fact that timestamps are assumed to be UNIX epoch values in milliseconds has zero consequences for Riak or the user. This means that the integer values currently stored in Riak can have any semantic meaning applied that the application/user so chooses.
All of these features are largely independent, although there is some sequencing that can be done based on utility and level of effort.
Allow clients to determine any above table-specific time-related metadata for display purposes is effectively a subtask of the previously-defined task to expose table metadata to client libraries.
- Allow users to enter ISO 8601 date/time strings in input and query via
riak-shell
. - Display resulting timestamps in
riak-shell
as ISO 8601 format. - Allow users to use time operations in queries.
- Allow operators to specify explicit integer values as a quantum function.
- Allow operators to define the meaning of a timestamp increment.
- Allow operators to specify alternative timezones as the default for a table.
We can limit system impact by performing timestamp translations
directly in riak-shell
instead of shipping them to Riak for
processing (much as client libraries can do today), but that assumes
riak-shell
will have access to table metadata, should any
exist. Also see the system default timezone complication referenced
under task #1 at the top.
Gordon has an existing date parsing library that handles some subset of the necessary changes.