-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement nan_value_counts && distinct_counts metrics in parquet writer #417
Comments
I can take this up @liurenjie1024 |
Thanks! |
Welcome! |
Just checking in @vaibhawvipul if you're still interested in adding this :) |
Hey @Fokko ! 👋🏻 As the original author has not replied, I am interested in taking it up :) Few points regardless of who this gets assigned to:
We will have to keep track of it on our own, so I think we would go through each |
Hi, @feniljain I also didn't find how distinct counts are implemented in java, but according to the spec it's supposed to be an estimated value using sketch. I think we could start with nan values and ignore distinct counts first.
|
That sounds interesting, thanks for the link up to spec!
Yup, let me work out a PR for nan_values first, also just confirming is the method mentioned by me up above correct for nan_values? |
Yes, exactly. |
For parquet writer, we still miss following field in DataFile.
The text was updated successfully, but these errors were encountered: