Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] add arrow spooling and adbc support #24587

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

dysn
Copy link

@dysn dysn commented Dec 26, 2024

Description

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Dec 26, 2024
@github-actions github-actions bot added the jdbc Relates to Trino JDBC driver label Dec 26, 2024
@github-actions github-actions bot added the ui Web UI label Dec 30, 2024
@@ -146,6 +146,14 @@
<artifactId>junit-jupiter-api</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, this is incompatible with java 8. need to extract arrow decoding to a separate module

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was essentially duplicated in the jdbc package. extracting it out so i can use it i the adbc package

import java.util.stream.Collectors;


public class ArrowIpcDataDecoder implements QueryDataDecoder {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented this to fit with the current row based client and to let me run the existing AbstractTestEngineOnlyQueries tests. not sure if this is actually desirable for arrow data though

@@ -239,6 +239,11 @@
<artifactId>trino-array</artifactId>
</dependency>

<dependency>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i implemented the arrow writing functionality in a new library, trino-arrow. but to use that as a spooling format, i need to include that in trino-main. is that reasonable?

}
private static ArrowType toArrowType(Type type)
{
return switch (type) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this switch covers the types in the spi package, but from running tests it seems that other types are sometimes produced. is there some way to know what types could be produced as output?

};
case TimestampWithTimeZoneType t -> new TimestampWithValueTimezoneType(t.getPrecision());
case DecimalType t -> new ArrowType.Decimal(t.getPrecision(), t.getScale());
case UuidType t -> new ArrowType.FixedSizeBinary(16);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a "standard" arrow extension type for uuids, but it doesnt seem to be implemented for java weirdly

@@ -0,0 +1,18 @@
package io.trino.arrow;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hack: copied from io.trino.server.protocol to avoid circular dependency

import org.apache.arrow.vector.FieldVector;
import org.apache.arrow.vector.complex.StructVector;

public class PicosecondTimeVector extends ExtensionTypeVector<StructVector> {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

represent picosecond times with a microsecond time and an offset. arrow has support for arbitrary fixed size types, could instead use a 12 byte type. that might be less usable by clients that dont have support for these extension types, but would be more forward compatible if we can convince the arrow project to add picosecond times (same applies to picosecond timestamps and the pico precision zoned time and timestamps)

import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.FieldType;

public class TimeWithValueTimezoneType extends ArrowType.ExtensionType {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like zoned time and timestamps in trino handle the zone at the row level (ie different rows can have different zones). arrow scopes the zone to the column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed jdbc Relates to Trino JDBC driver ui Web UI
Development

Successfully merging this pull request may close these issues.

1 participant