-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support get partition table with filter #24
Comments
Hello @Fokko, here is my use case:
Thank you! |
@puchengy The problem with Iceberg is that the partition is more of a logical concept, rather than a physical path like in a Hive table. What do you think of passing in a predicate, and letting the Airflow sensor pass if there are rows? For example, you could go from a daily to an hourly partition. Then you would get:
|
@Fokko That works. This is actually what we are doing (but for legacy_python) pinterest/iceberg@7d8d65d Would this be something we can implement in the upstream? Thanks |
@Fokko gentle ping, thanks ^ |
@puchengy Yes, certainly. Would this be something that you're interested in working on? From the snapshot, we can load the manifest list, and from there the manifests themselves, which contain the partition information |
@Fokko Yes, I can help. Thanks. |
I was looking for something comparable to spark's partitions metadata table, which lets me do something like this
to determine if and when a partition was updated, and came across this issue. It sounds like this could be provided by this Feature Request if it includes a ManifestReader with filtering features like the linked code in legacy_python. Is that correct? If not, I will try to raise as a separate issue. |
Partitions table was added in: #603 |
Feature Request / Improvement
Migration of issue apache/iceberg#8619
The text was updated successfully, but these errors were encountered: