[bug] Cannot perform table scan on V1 table #1194

kevinjqliu · 2024-09-23T22:15:43Z

Apache Iceberg version

main (development)

Please describe the bug 🐞

While working with a V1 table, I noticed a few bugs which prevent table scan on V1 table.

Reading the manifest list defaults to V2

iceberg-python/pyiceberg/manifest.py

Line 635 in 620ad9f

MANIFEST_LIST_FILE_SCHEMAS[DEFAULT_READ_VERSION],
Accessing fields not available in V1.

iceberg-python/pyiceberg/table/__init__.py

Line 1311 in 620ad9f

if manifest.content == ManifestContent.DATA

The content field is not available in V1, according to the spec.
There are multiple places where something like this occurs.

Add a test to verify table scan on a V1 table

The text was updated successfully, but these errors were encountered:

Fokko · 2024-10-30T06:13:15Z

@kevinjqliu Thanks for raising this. Can you elaborate on what you encountered when reading a V1 table? The Iceberg metadata is forward compatible, meaning we can turn any V1 table into a V2 (or V3) without issues.

The content field you mention will always be DATA in V1 (since there are no delete files). This can be solved easily with initial-default values. We do this in other places, such as sequence numbers.

It would be great to get a test that uncovers the issue so we can get this fixed :)

kevinjqliu · 2025-01-04T07:58:29Z

Added a reproducible test in #1483, i had to save the biglake iceberg table locally. please take a look

kevinjqliu · 2025-01-04T19:07:29Z

#1484 is a better, isolated test. It's using the minimal required schema for a v1 table manifest list.
In general, we should do this for all v1 schemas

kevinjqliu · 2025-01-04T19:16:20Z

Spark's V1 manifest list writer writes the optional added_rows_count field.

https://github.com/apache/iceberg/blob/fcd5dd932a21066d6127c94c50f3de43e8c2d80c/core/src/main/java/org/apache/iceberg/ManifestListWriter.java#L166-L167

https://github.com/apache/iceberg/blob/fcd5dd932a21066d6127c94c50f3de43e8c2d80c/core/src/main/java/org/apache/iceberg/V1Metadata.java#L31-L44

kevinjqliu self-assigned this Sep 23, 2024

sungwy added this to the PyIceberg 0.8.0 release milestone Sep 24, 2024

kevinjqliu modified the milestones: PyIceberg 0.8.0 release, PyIceberg 0.9.0 release Oct 30, 2024

kevinjqliu mentioned this issue Jan 4, 2025

Add reproducible test for #1194 #1483

Closed

kevinjqliu mentioned this issue Jan 4, 2025

use minimal required fields #1484

Draft

kevinjqliu removed this from the PyIceberg 0.9.0 release milestone Feb 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] Cannot perform table scan on V1 table #1194

[bug] Cannot perform table scan on V1 table #1194

kevinjqliu commented Sep 23, 2024

Fokko commented Oct 30, 2024

kevinjqliu commented Jan 4, 2025

kevinjqliu commented Jan 4, 2025 •

edited

Loading

kevinjqliu commented Jan 4, 2025 •

edited

Loading

[bug] Cannot perform table scan on V1 table #1194

[bug] Cannot perform table scan on V1 table #1194

Comments

kevinjqliu commented Sep 23, 2024

Apache Iceberg version

Please describe the bug 🐞

Fokko commented Oct 30, 2024

kevinjqliu commented Jan 4, 2025

kevinjqliu commented Jan 4, 2025 • edited Loading

kevinjqliu commented Jan 4, 2025 • edited Loading

kevinjqliu commented Jan 4, 2025 •

edited

Loading

kevinjqliu commented Jan 4, 2025 •

edited

Loading