Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Cannot perform table scan on V1 table #1194

Open
kevinjqliu opened this issue Sep 23, 2024 · 4 comments
Open

[bug] Cannot perform table scan on V1 table #1194

kevinjqliu opened this issue Sep 23, 2024 · 4 comments
Assignees

Comments

@kevinjqliu
Copy link
Contributor

Apache Iceberg version

main (development)

Please describe the bug 🐞

While working with a V1 table, I noticed a few bugs which prevent table scan on V1 table.

  1. Reading the manifest list defaults to V2

    MANIFEST_LIST_FILE_SCHEMAS[DEFAULT_READ_VERSION],

  2. Accessing fields not available in V1.

    if manifest.content == ManifestContent.DATA

    The content field is not available in V1, according to the spec.
    There are multiple places where something like this occurs.

Add a test to verify table scan on a V1 table

@kevinjqliu kevinjqliu self-assigned this Sep 23, 2024
@sungwy sungwy added this to the PyIceberg 0.8.0 release milestone Sep 24, 2024
@Fokko
Copy link
Contributor

Fokko commented Oct 30, 2024

@kevinjqliu Thanks for raising this. Can you elaborate on what you encountered when reading a V1 table? The Iceberg metadata is forward compatible, meaning we can turn any V1 table into a V2 (or V3) without issues.

The content field you mention will always be DATA in V1 (since there are no delete files). This can be solved easily with initial-default values. We do this in other places, such as sequence numbers.

It would be great to get a test that uncovers the issue so we can get this fixed :)

@kevinjqliu
Copy link
Contributor Author

Added a reproducible test in #1483, i had to save the biglake iceberg table locally. please take a look

@kevinjqliu
Copy link
Contributor Author

kevinjqliu commented Jan 4, 2025

#1484 is a better, isolated test. It's using the minimal required schema for a v1 table manifest list.
In general, we should do this for all v1 schemas

@kevinjqliu kevinjqliu removed this from the PyIceberg 0.9.0 release milestone Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants