Open
Description
Question
I'm looking for a tutorial to make a query on one subfield of a struct field.
I scrolled all internet but failed to find a way to do it simply with pyiceberg.
To make it concret, for instance how to get the row with "employment.status = 'Employed'"
:
[{'id': 1,
'name': 'Alice',
'age': 28,
'address': {'street': '123 Maple St',
'city': 'Springfield',
'postal_code': '12345'},
'contact': {'email': '[email protected]', 'phone': '555-1234'},
'employment': {'status': 'Employed',
'position': 'Software Engineer',
'company': {'name': 'Tech Corp', 'location': 'Silicon Valley'}},
'preferences': {'newsletter': True,
'notifications': {'email': True, 'sms': False}}},
{'id': 2,
'name': 'Bob',
'age': 35,
'address': {'street': '456 Oak St',
'city': 'Metropolis',
'postal_code': '67890'},
'contact': {'email': '[email protected]', 'phone': '555-5678'},
'employment': {'status': 'Self-employed',
'position': 'Consultant',
'company': {'name': 'Freelance', 'location': 'Remote'}},
'preferences': {'newsletter': False,
'notifications': {'email': True, 'sms': True}}}]
With the following schema:
import pyarrow as pa
schema = pa.schema([
('id', pa.int32()),
('name', pa.string()),
('age', pa.int32()),
('address', pa.struct([
('street', pa.string()),
('city', pa.string()),
('postal_code', pa.string())
])),
('contact', pa.struct([
('email', pa.string()),
('phone', pa.string())
])),
('employment', pa.struct([
pa.field('status', pa.string(), nullable=True),
pa.field('position', pa.string(), nullable=True),
pa.field('company', pa.struct([
('name', pa.string()),
('location', pa.string())
]), nullable=True)
])),
('preferences', pa.struct([
('newsletter', pa.bool_()),
('notifications', pa.struct([
('email', pa.bool_()),
('sms', pa.bool_())
]))
]))
])
I tried this kind of query, but without success:
row_filter = "employment.status = 'Employed'"
table.scan(
row_filter=row_filter,
selected_fields=["age", "employment", 'contact.email']
).to_pandas()
The command raises the error:
ValueError: Could not find field with name status, case_sensitive=True
The backend is supported by SQLite.
versions:
$ pip list | grep 'iceberg\|arrow\|sqlite'
arrow 1.3.0
pyarrow 15.0.2
pyiceberg 0.6.1
Metadata
Metadata
Assignees
Labels
No labels