Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closing the loop of serialization (lists, sets, maps) #37

Open
pereferrera opened this issue Dec 30, 2013 · 0 comments
Open

Closing the loop of serialization (lists, sets, maps) #37

pereferrera opened this issue Dec 30, 2013 · 0 comments

Comments

@pereferrera
Copy link
Contributor

Something which remains in my mind is the possibility of closing the loop and making Pangool have all the convenient serialization features, which remain to be : Lists and Maps (being Set a particular case of a Map).

Currently it is possible to serialize them using Avro but the integration code required doesn't look very nice. Pangool could add a wrapper to make this a little nicer - delegating the serialization to Avro - but then it wouldn't be possible to serialize Lists of arbitrary Objects.

While it is true that it wasn't the main idea of Pangool to make it fully serialization-built-in functional, there is no reason why new features which pay off, are easy to implement and make sense with the whole codebase shouldn't be implemented.

What's more, taking a look at the current code, it doesn't seem difficult to add proper built-in serialization support for (typed) Lists or Maps. A custom FieldSerialization could be implemented, which writes the list length first and calls the delegate code in SimpleTupleSerializer for serializing the list typed values.

This would allow for arbitrary typed lists, the type defined by a Pangool's Field (so the method in Field would be something like:

public static Field createListField(String name, Field type)

Therefore it would be possible to serialize lists of lists of lists. Or lists of Tuples. Or anything which is possible due to this recursion.

Opened questions would then be:

  • How to deal efficiently with null values.
  • Whether heterogeneous lists should be considered at all or discarded (serialization would be then inefficient and complex).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant