Skip to content

[BUG] Parquet column selection by name with schemas including list<struct<X, Y>> does not work. #14539

@nvdbaranec

Description

@nvdbaranec

If you have a schema that contains a list-of-struct, selecting a subset of the inner columns doesn't work. Example

list<struct<int, float>>
If the schema for this column was

A           (list)
   B        (struct)
       C    (int)
       D    (float)

Attempting to select "A.B.C" would not work. I believe this is being caused by some schema preprocessing that we are doing that is injecting fake schema elements to ease schema interpretation. Essentially we see a schema that looks like this:

A            (list)
  list       (the fake element
     B       (struct)
        C    (int)
        D    (float)

So "A.B.C" doesn't actually exist, only "A.list.B.C" and the code returns 0 columns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    0 - BacklogIn queue waiting for assignmentPythonAffects Python cuDF API.bugSomething isn't workingcuIOcuIO issuelibcudfAffects libcudf (C++/CUDA) code.

    Type

    No type

    Projects

    Status

    To be revisited

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions