Skip to content

Generate enum classes as part of DataSchema #118

@koperagen

Description

@koperagen

Imagine a CSV with a "day_of_week" column with string values like "monday", "friday", etc. If you could convert this column to an enum, you could use the help of completion to, for example, filter it.

It can be done the same way as generating data schemas:

  1. after cell execution in the notebooks
  2. on data schema import in gradle project

There are some design questions:

  1. What if i don't need an enum?

  2. What about normalization? "monday", "Monday" aren't the same thing.
    In jupyter, you can normalize values however you want and get a nice enum.
    In gradle project code generation happens once, in build time, so your values have to be normalized. How?

  3. How many values in the enum is too much?

  4. What if not all possible values are present in the column? What should happen if generated schema knows about 2 enum values, but the actual column at runtime has more?

Metadata

Metadata

Assignees

No one assigned

    Labels

    researchThis requires a deeper dive to gather a better understanding

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions