Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28673: Fix issues in JSON SerDe implementations related to Decimal #5584

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

armitage420
Copy link
Contributor

@armitage420 armitage420 commented Dec 19, 2024

What changes were proposed in this pull request?

Fixing jackson parser code to increase precision of decimal values during deserialisation.

Why are the changes needed?

Jackson tree parser incorrectly parses decimal values, that leads to data issues. [Please see jira to reproduce steps and see the wrong values]

Does this PR introduce any user-facing change?

Yes

Is the change a dependency upgrade?

No

How was this patch tested?

My test case and current testclasses

@@ -131,7 +143,7 @@ POSTHOOK: type: QUERY
POSTHOOK: Input: default@json_serde3_2
#### A masked pattern was here ####
-2 FALSE NULL TRUE 1.23E45 value
-2 false NULL true 1.23E45 value
-2 false NULL true 1.23E+45 value
Copy link
Contributor

@okumin okumin Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this expression change? Is it a valid and expected change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okumin Thank you so much for the review!
This was not an expected change but definitely a valid one. It is a java issue as we are now using java.math.BigDecimal for float values. BigDecimal parses scientific notations and adds a '+' to our number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned that all applications using Hive, e.g., BI, can decode both formats or not, especially when they are developed not by Java. I could be too defensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your concern is understandable, so I looked into some BI tools that connect to Hive, specifically Power BI. Power BI offers DAX (Data Analysis Expressions) functions that allow users to format scientific notations in any required format, as both E and E+ are valid notations.

Additionally, any BI tool capable of decoding scientific notation with a negative exponent (e.g., 1.23E-45) should also handle positive exponents without issues.

@@ -172,7 +173,7 @@ public HiveJsonReader(ObjectInspector oi) {
public HiveJsonReader(ObjectInspector oi, TimestampParser tsParser) {
this.tsParser = tsParser;
this.oi = oi;
this.objectMapper = new ObjectMapper();
this.objectMapper = new ObjectMapper().enable(DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this change be made applicable only for Primitive decimal type ?

Copy link
Contributor Author

@armitage420 armitage420 Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Indhumathi27 Thank you so much for your review! The issue is not really specific to primitive data types while we are deserializing them, rather jackson parser is rounding off our json values when we are creating json node using it(jackson parser).

@armitage420
Copy link
Contributor Author

armitage420 commented Jan 19, 2025

@Indhumathi27 @okumin I also added a new commit to fix the empty(null byte) output for hcatalog serde table. Please have a look if/when you have the time! Thank You!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants