-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28673: Fix issues in JSON SerDe implementations related to Decimal #5584
base: master
Are you sure you want to change the base?
Conversation
@@ -131,7 +143,7 @@ POSTHOOK: type: QUERY | |||
POSTHOOK: Input: default@json_serde3_2 | |||
#### A masked pattern was here #### | |||
-2 FALSE NULL TRUE 1.23E45 value | |||
-2 false NULL true 1.23E45 value | |||
-2 false NULL true 1.23E+45 value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this expression change? Is it a valid and expected change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@okumin Thank you so much for the review!
This was not an expected change but definitely a valid one. It is a java issue as we are now using java.math.BigDecimal for float values. BigDecimal parses scientific notations and adds a '+' to our number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned that all applications using Hive, e.g., BI, can decode both formats or not, especially when they are developed not by Java. I could be too defensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your concern is understandable, so I looked into some BI tools that connect to Hive, specifically Power BI. Power BI offers DAX (Data Analysis Expressions) functions that allow users to format scientific notations in any required format, as both E
and E+
are valid notations.
Additionally, any BI tool capable of decoding scientific notation with a negative exponent (e.g., 1.23E-45
) should also handle positive exponents without issues.
@@ -172,7 +173,7 @@ public HiveJsonReader(ObjectInspector oi) { | |||
public HiveJsonReader(ObjectInspector oi, TimestampParser tsParser) { | |||
this.tsParser = tsParser; | |||
this.oi = oi; | |||
this.objectMapper = new ObjectMapper(); | |||
this.objectMapper = new ObjectMapper().enable(DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this change be made applicable only for Primitive decimal type ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Indhumathi27 Thank you so much for your review! The issue is not really specific to primitive data types while we are deserializing them, rather jackson parser is rounding off our json values when we are creating json node using it(jackson parser).
@Indhumathi27 @okumin I also added a new commit to fix the empty(null byte) output for hcatalog serde table. Please have a look if/when you have the time! Thank You! |
c936947
to
7e755ae
Compare
Quality Gate passedIssues Measures |
What changes were proposed in this pull request?
Fixing jackson parser code to increase precision of decimal values during deserialisation.
Why are the changes needed?
Jackson tree parser incorrectly parses decimal values, that leads to data issues. [Please see jira to reproduce steps and see the wrong values]
Does this PR introduce any user-facing change?
Yes
Is the change a dependency upgrade?
No
How was this patch tested?
My test case and current testclasses