Skip to content

Conversation

luohao
Copy link
Member

@luohao luohao commented Oct 16, 2025

Description

Initial Lance connector implementation. It adds 2 modules:

  • trino-lance-file: java implementation of the Lance file format(reader for latest version 2.1 in this PR). Also included round trip tests to validate the Lance file reader with JNI writer provided by lancedb.
  • trino-lance: java implementation of Lance table format. It's a basic connector implementation with read only capability. Added basic connector tests with TPCH data to cover e2e tests.

Additional context and related issues

Benchmark results show that this implementation is up to 5x faster than other implementation using the JNI lib. The following results are collected from a Macbook Pro with Apple M2 Max CPU. Some notes about the result:

  • I have implemented vectorized reader for the fastlane encoding(https://github.com/luohao/fastlanes-java) for integers. The performance is on-par with the rust implementation. I believe the saving mostly come from eliminating all unnecessary overhead, including JNI overhead, memory copy and Arrow to Trino Block conversion.
  • The readVarchar is worse than JNI version because the my current FSST decoder doesn't have any optimizations(e.g., vectorization, loop unrolling, and reduced memory copy). I have a TODO for this.
Benchmark                              Mode  Cnt    Score    Error  Units
BenchmarkColumnReaders.readBigInt      avgt   60    7.700 ±  0.196  ns/op
BenchmarkColumnReaders.readBigIntJNI   avgt   60    9.684 ±  0.251  ns/op
BenchmarkColumnReaders.readList        avgt   60  147.698 ±  1.692  ns/op
BenchmarkColumnReaders.readListJNI     avgt   60  396.676 ± 15.791  ns/op
BenchmarkColumnReaders.readStruct      avgt   60   21.180 ±  0.345  ns/op
BenchmarkColumnReaders.readStructJNI   avgt   60  110.885 ±  0.581  ns/op
BenchmarkColumnReaders.readVarchar     avgt   60  131.577 ±  1.683  ns/op
BenchmarkColumnReaders.readVarcharJNI  avgt   60   37.094 ±  0.228  ns/op

More details about the design and implementation of Lance file/table format can be found at

This is my first PR to kick off the Lance connector work. My immediate TODO list(aka Roadmap) already includes:

  • Lance File Format
    • Enhance Reader
      • Support FixedSizeList
      • Support FullZip encoding
    • Implement Writer
  • Lance Table Format
    • Support multiple data files per fragment
    • Support delete files
    • Support index files
    • Support other catalogs

And more to be added...

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Oct 16, 2025
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @luohao, your pull request is larger than the review limit of 150000 diff characters

@luohao luohao force-pushed the hluo/lance-init-pr branch 2 times, most recently from 874c64e to 2b2a035 Compare October 16, 2025 19:54
@ebyhr ebyhr self-requested a review October 16, 2025 22:47
@chenjian2664 chenjian2664 self-requested a review October 17, 2025 03:46
import static com.google.common.io.RecursiveDeleteOption.ALLOW_INSECURE;
import static java.nio.file.Files.createTempDirectory;

public class TempFile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason for this TempFile.class ? Files.createTempFile we can use them to create the files on demand right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr ebyhr changed the title Add initial Lance connector implementation Add Lance connector Oct 17, 2025
@luohao luohao force-pushed the hluo/lance-init-pr branch from 0b965b4 to e058e68 Compare October 18, 2025 03:49
@ebyhr ebyhr added the needs-docs This pull request requires changes to the documentation label Oct 18, 2025
@ebyhr ebyhr force-pushed the hluo/lance-init-pr branch from 49d22af to 06913f5 Compare October 18, 2025 09:59
@ebyhr ebyhr force-pushed the hluo/lance-init-pr branch from 06913f5 to 29a0e87 Compare October 18, 2025 10:18
Comment on lines +32 to +34
// A simple product test to ensure no class loader issue.
QueryResult result = onTrino().executeQuery("SHOW SCHEMAS FROM lance");
assertThat(result).containsOnly(row("default"), row("information_schema"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test a SELECT statement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connector is read only right now, adding SELECT would need additional stuff to set up the data.

Do you have an example that prepares data for a read only connector?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add static resource files or use the datasource's library in such a case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to add files to minio object store via copying files to minio container's /data/test-bucket/ directory but doesn't seem to work.

Do you know any example of copying directories to object store?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Answered in Slack DM)

@github-actions github-actions bot added the docs label Oct 20, 2025
@ebyhr ebyhr removed the needs-docs This pull request requires changes to the documentation label Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants