Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[Avro] Add
logicalType
support for somejava.time
types; addAvroJavaTimeModule
for native ser/deser #283New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Avro] Add
logicalType
support for somejava.time
types; addAvroJavaTimeModule
for native ser/deser #283Changes from 8 commits
62570b4
1bd02ba
8f52e52
ba76375
956f365
170525f
e02185d
707f3a4
a05b4ab
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "is at serialization" instead be "is not included at serialization" (or something like that)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right,
I meant "is lost at serialization"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of
OffsetDateTime
andZonedDateTime
, aren't they deserialized using whateverdefaultZoneId
is passed toAvroInstantDeserializer.fromLong()
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is correct.
They are deserialized using whatever `defaultZoneId1 is passed to AvroInstantDeserializer.fromLong()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be clearer to say something like, "Note that time zone & offset information is not serialized—the serialized representation is only a point in time. For local time types (
OffsetDateTime
andZonedDateTime
) the time zonedefaultZoneId
in the Avro context is used for converting to & from the serialized representation. The samedefaultZoneId
must be used at serialization & deserialization time to obtain meaningful results."There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having written that, the behavior feels weird to me. Would it be possible to store the offset/zoneId for the
OffsetDateTime
andZonedDateTime
types?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, zoned things absolutely should retain time zone and/or offset, and changing that to something else feels very much Wrong.
For local variants it may be necessary to do interim binding if (but only if) representation uses a fixed timepoint (like
long
for "milliseconds since 1. 1. 1970") -- but if so, it should be something really fixed like UTC and not whatever user happens to configure (because "default" zone id would otherwise have to match on writer and reader).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cowtowncoder
Avro specification does not aim to preserve time zone for non
local-XYZ
logical types.For correct deserialization into
OffsetDateTime
orZonedDateTime
reader and writer have to agree on time zone.Support of
OffsetDateTime
andZonedDateTime
is here for convenience - I can drop it.For local variants, contextual time zone is not used at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichalFoksa Ok, I think I better go over the changes once again & try to find what Avro specification says. Although handling of local/zoned types has expected semantics in Java 8, I vaguely recall Avro proscribing behvavior that seemed to differ... and I think for interoperability the letter of Avro spec should usually have precedence (even if I disagreed with how it was defined).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even after reading and re-reading what Avro spec says about timestamp, regular and local, I have no idea what are those supposed to mean -- it seems nonsense to be blunt.
Not the part about physical storage itself (although why on earth are there separate milli- vs micro-second types?) but ... well, if NEITHER stores timezone information NOR is there ANY WAY to sync send/receiver zones, then... there seems to be no actual reason for 2 types. At all. I mean, timestamp in this sense can NEITHER be local (it is concrete physical time offset) NOR non-local (no time offset or time zone!). It is much like
java.util.Date
yet named in a confusing way, using 4 different but rather non-distinct types.What a mess!
But to try to untangle the mess I guess there is only the one question of how would read and write operations handle these differently.
On writing side there probably cannot be any difference: physical timestamp is what it is. Whatever "local" timezone could be thought to be does not matter; change of zone/offset would not change that value.
On reading side timezone/offset is sort of arbitrary as well: value itself is concrete, although we can use whatever zone we might want.
I guess use of "context timezone" could make sense for "local" variant? For non-local I don't have strong opinion: either UTC or context timezone would be fine as far as I see it.
@MichalFoksa WDYT? Apologies for this taking long -- but I have some time now and will get this merged during this week :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"local" variants do not contain time zone.
I would leave it on user.
Yeah, Avro is Avro ... :) But it is not bad.