-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] add arrow spooling and adbc support #24587
base: master
Are you sure you want to change the base?
Conversation
a5d9b2f
to
73cf355
Compare
@@ -146,6 +146,14 @@ | |||
<artifactId>junit-jupiter-api</artifactId> | |||
<scope>test</scope> | |||
</dependency> | |||
<dependency> | |||
<groupId>org.apache.arrow</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately, this is incompatible with java 8. need to extract arrow decoding to a separate module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was essentially duplicated in the jdbc package. extracting it out so i can use it i the adbc package
import java.util.stream.Collectors; | ||
|
||
|
||
public class ArrowIpcDataDecoder implements QueryDataDecoder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implemented this to fit with the current row based client and to let me run the existing AbstractTestEngineOnlyQueries tests. not sure if this is actually desirable for arrow data though
@@ -239,6 +239,11 @@ | |||
<artifactId>trino-array</artifactId> | |||
</dependency> | |||
|
|||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i implemented the arrow writing functionality in a new library, trino-arrow. but to use that as a spooling format, i need to include that in trino-main. is that reasonable?
} | ||
private static ArrowType toArrowType(Type type) | ||
{ | ||
return switch (type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this switch covers the types in the spi package, but from running tests it seems that other types are sometimes produced. is there some way to know what types could be produced as output?
}; | ||
case TimestampWithTimeZoneType t -> new TimestampWithValueTimezoneType(t.getPrecision()); | ||
case DecimalType t -> new ArrowType.Decimal(t.getPrecision(), t.getScale()); | ||
case UuidType t -> new ArrowType.FixedSizeBinary(16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a "standard" arrow extension type for uuids, but it doesnt seem to be implemented for java weirdly
@@ -0,0 +1,18 @@ | |||
package io.trino.arrow; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hack: copied from io.trino.server.protocol to avoid circular dependency
import org.apache.arrow.vector.FieldVector; | ||
import org.apache.arrow.vector.complex.StructVector; | ||
|
||
public class PicosecondTimeVector extends ExtensionTypeVector<StructVector> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
represent picosecond times with a microsecond time and an offset. arrow has support for arbitrary fixed size types, could instead use a 12 byte type. that might be less usable by clients that dont have support for these extension types, but would be more forward compatible if we can convince the arrow project to add picosecond times (same applies to picosecond timestamps and the pico precision zoned time and timestamps)
import org.apache.arrow.vector.types.pojo.ArrowType; | ||
import org.apache.arrow.vector.types.pojo.FieldType; | ||
|
||
public class TimeWithValueTimezoneType extends ArrowType.ExtensionType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like zoned time and timestamps in trino handle the zone at the row level (ie different rows can have different zones). arrow scopes the zone to the column.
Description
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: