Query on nested struct field with PyIceberg?

### Question

I'm looking for a tutorial to make a query on one subfield of a struct field.
I scrolled all internet but failed to find a way to do it simply with pyiceberg.

To make it concret, for instance how to get the row with `"employment.status = 'Employed'"` : 

```python
[{'id': 1,
  'name': 'Alice',
  'age': 28,
  'address': {'street': '123 Maple St',
   'city': 'Springfield',
   'postal_code': '12345'},
  'contact': {'email': 'alice@example.com', 'phone': '555-1234'},
  'employment': {'status': 'Employed',
   'position': 'Software Engineer',
   'company': {'name': 'Tech Corp', 'location': 'Silicon Valley'}},
  'preferences': {'newsletter': True,
   'notifications': {'email': True, 'sms': False}}},
 {'id': 2,
  'name': 'Bob',
  'age': 35,
  'address': {'street': '456 Oak St',
   'city': 'Metropolis',
   'postal_code': '67890'},
  'contact': {'email': 'bob@example.com', 'phone': '555-5678'},
  'employment': {'status': 'Self-employed',
   'position': 'Consultant',
   'company': {'name': 'Freelance', 'location': 'Remote'}},
  'preferences': {'newsletter': False,
   'notifications': {'email': True, 'sms': True}}}]
```
   
With the following schema:
```
 import pyarrow as pa
 
 schema = pa.schema([
  ('id', pa.int32()),
  ('name', pa.string()),
  ('age', pa.int32()),
  ('address', pa.struct([
      ('street', pa.string()),
      ('city', pa.string()),
      ('postal_code', pa.string())
  ])),
  ('contact', pa.struct([
      ('email', pa.string()),
      ('phone', pa.string())
  ])),
  ('employment', pa.struct([
      pa.field('status', pa.string(), nullable=True),
      pa.field('position', pa.string(), nullable=True),
      pa.field('company', pa.struct([
          ('name', pa.string()),
          ('location', pa.string())
      ]), nullable=True)
  ])),
  ('preferences', pa.struct([
      ('newsletter', pa.bool_()),
      ('notifications', pa.struct([
          ('email', pa.bool_()),
          ('sms', pa.bool_())
      ]))
  ]))
])
```

I tried this kind of query, but without success:

```python
row_filter = "employment.status = 'Employed'"

table.scan(
    row_filter=row_filter,
    selected_fields=["age", "employment", 'contact.email']
).to_pandas()
```
The command raises the error:
```Python
ValueError: Could not find field with name status, case_sensitive=True
```

The backend is supported by SQLite.

versions:
```bash
$ pip list | grep 'iceberg\|arrow\|sqlite'
arrow                     1.3.0
pyarrow                   15.0.2
pyiceberg                 0.6.1
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Query on nested struct field with PyIceberg? #953

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Query on nested struct field with PyIceberg? #953

Description

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions