Skip to content

Allow exposing CBOR simple values as VALUE_EMBEDDED_OBJECT #590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,21 @@ public enum Feature implements FormatFeature
*
* @since 2.20
*/
READ_UNDEFINED_AS_EMBEDDED_OBJECT(false)
READ_UNDEFINED_AS_EMBEDDED_OBJECT(false),

/**
* Feature that determines how a CBOR "simple value" of major type 7 is exposed by parser.
* <p>
* When enabled, the parser returns {@link JsonToken#VALUE_EMBEDDED_OBJECT} with
* an embedded value of type {@link CBORSimpleValue}, allowing the caller to distinguish
* these values from actual {@link JsonToken#VALUE_NUMBER_INT}s.
* When disabled, simple values are returned as {@link JsonToken#VALUE_NUMBER_INT}.
*<p>
* The default value is {@code false} for backwards compatibility (with versions prior to 2.20).
*
* @since 2.20
*/
READ_SIMPLE_VALUE_AS_EMBEDDED_OBJECT(false)
;

private final boolean _defaultState;
Expand Down Expand Up @@ -363,6 +377,14 @@ public int getFirstTag() {
*/
protected TagList _tagValues = new TagList();

/**
* When major type 7 value is encountered and exposed as {@link JsonToken#VALUE_EMBEDDED_OBJECT},
* the value will be stored here.
*
* @since 2.20
*/
protected CBORSimpleValue _simpleValue;

/**
* Flag that indicates that the current token has not yet
* been fully processed, and needs to be finished for
Expand Down Expand Up @@ -824,9 +846,9 @@ public JsonToken nextToken() throws IOException
_skipIncomplete();
}
_tokenInputTotal = _currInputProcessed + _inputPtr;
// also: clear any data retained so far
_numTypesValid = NR_UNKNOWN;
_binaryValue = null;

// also: clear any data retained for previous token
clearRetainedValues();
Copy link
Member

@cowtowncoder cowtowncoder May 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few other places this must be done (everywhere where _binaryValue = null; is done)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought clearing it once at the beginning of nextToken could be enough, given it's used only once for major 7, and there is no partial assignment (always one byte).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't want to count on state not getting cleared for nextFieldName() variants, as they won't call nextToken().
It may be that a combination of things could keep things consistent (as in, not exposing stale simple value) but when adding new features invariants might not hold.

But if you are confident after looking, that's fine, just LMK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the side; so far I've been exploring the async impl of CBOR; and I'm almost done with my draft, building almost all major types to explore the spec. In next weeks, I will start work on the actual organized PR, However, I'd like to know if you think there are higher priority issues that the community would like to be fixed. I'm enjoying contributing to the project in my free time, so aligning with priorities will make it even more valuable.

After the async parser, I believe looking at Avro issues is worth it, as I think it's the most used (just a guess from the number of issues)

Would appreciate your thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't want to count on state not getting cleared for nextFieldName() variants, as they won't call nextToken(). It may be that a combination of things could keep things consistent (as in, not exposing stale simple value) but when adding new features invariants might not hold.

But if you are confident after looking, that's fine, just LMK.

Aha, I got it, and well, I'm not 100% confident about the different combinations that might happen that can lead to exposing stale values, so yeah, better not to count on that. I will commit the change to clear it in places where binary values is cleared

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iifawzi On async impl: that sounds like something that would be well received so +1 for that. Avro does seem to get more attention that I'd have expected so that's a good thing to follow up on too. And maybe logical type support and whatever "new" (... I haven't been following up spec development) features for CBOR would be good as well.
(Avro, too, has logical type concept).

And ideally Proto3 would be supported by protobuf module. There's also question of replacing protoc-parser (reader for schema) that is deprecated with library that is to replace it (by same authors). I haven't had time to follow up on this, I think there's an issue filed to request it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the insights @cowtowncoder! I’ll keep those points in mind as I move forward after the async impl


// First: need to keep track of lengths of defined-length Arrays and
// Objects (to materialize END_ARRAY/END_OBJECT as necessary);
Expand Down Expand Up @@ -1453,12 +1475,12 @@ public boolean nextFieldName(SerializableString str) throws IOException
{
// Two parsing modes; can only succeed if expecting field name, so handle that first:
if (_streamReadContext.inObject() && _currToken != JsonToken.FIELD_NAME) {
_numTypesValid = NR_UNKNOWN;
if (_tokenIncomplete) {
_skipIncomplete();
}
_tokenInputTotal = _currInputProcessed + _inputPtr;
_binaryValue = null;
// need to clear retained values for previous token
clearRetainedValues();
_tagValues.clear();
// completed the whole Object?
if (!_streamReadContext.expectMoreValues()) {
Expand Down Expand Up @@ -1506,19 +1528,19 @@ public boolean nextFieldName(SerializableString str) throws IOException
}
}
// otherwise just fall back to default handling; should occur rarely
return (nextToken() == JsonToken.FIELD_NAME) && str.getValue().equals(getCurrentName());
return (nextToken() == JsonToken.FIELD_NAME) && str.getValue().equals(currentName());
}

@Override
public String nextFieldName() throws IOException
{
if (_streamReadContext.inObject() && _currToken != JsonToken.FIELD_NAME) {
_numTypesValid = NR_UNKNOWN;
if (_tokenIncomplete) {
_skipIncomplete();
}
_tokenInputTotal = _currInputProcessed + _inputPtr;
_binaryValue = null;
// need to clear retained values for previous token
clearRetainedValues();
_tagValues.clear();
// completed the whole Object?
if (!_streamReadContext.expectMoreValues()) {
Expand Down Expand Up @@ -1843,7 +1865,10 @@ public Object getEmbeddedObject() throws IOException
if (_tokenIncomplete) {
_finishToken();
}
if (_currToken == JsonToken.VALUE_EMBEDDED_OBJECT ) {
if (_currToken == JsonToken.VALUE_EMBEDDED_OBJECT) {
if (_simpleValue != null) {
return _simpleValue;
}
return _binaryValue;
}
return null;
Expand Down Expand Up @@ -1933,11 +1958,11 @@ private final byte[] _getBinaryFromString(Base64Variant variant) throws IOExcept
/**
* Checking whether the current token represents an `undefined` value (0xF7).
* <p>
* This method allows distinguishing between real {@code null} and `undefined`,
* This method allows distinguishing between real {@code null} and {@code undefined},
* even if {@link CBORParser.Feature#READ_UNDEFINED_AS_EMBEDDED_OBJECT} is disabled
* and the token is reported as {@link JsonToken#VALUE_NULL}.
*
* @return {@code true} if current token is an `undefined`, {@code false} otherwise
* @return {@code true} if current token is an {@code undefined}, {@code false} otherwise
*
* @since 2.20
*/
Expand Down Expand Up @@ -3713,38 +3738,50 @@ protected JsonToken _decodeUndefinedValue() {
* Helper method that deals with details of decoding unallocated "simple values"
* and exposing them as expected token.
* <p>
* As of Jackson 2.12, simple values are exposed as
* {@link JsonToken#VALUE_NUMBER_INT}s,
* but in later versions this is planned to be changed to separate value type.
* Starting with Jackson 2.20, this behavior can be changed by enabling the
* {@link CBORParser.Feature#READ_SIMPLE_VALUE_AS_EMBEDDED_OBJECT}
* feature, in which case simple values are returned as {@link JsonToken#VALUE_EMBEDDED_OBJECT} with an
* embedded {@link CBORSimpleValue} instance.
*
* @since 2.12
*/
public JsonToken _decodeSimpleValue(int lowBits, int ch) throws IOException {
if (lowBits > 24) {
_invalidToken(ch);
}
final boolean simpleAsEmbedded = Feature.READ_SIMPLE_VALUE_AS_EMBEDDED_OBJECT.enabledIn(_formatFeatures);
if (lowBits < 24) {
_numberInt = lowBits;
if (simpleAsEmbedded) {
_simpleValue = new CBORSimpleValue(lowBits);
} else {
_numberInt = lowBits;
}
} else { // need another byte
if (_inputPtr >= _inputEnd) {
loadMoreGuaranteed();
}
_numberInt = _inputBuffer[_inputPtr++] & 0xFF;

// As per CBOR spec, values below 32 not allowed to avoid
// confusion (as well as guarantee uniqueness of encoding)
if (_numberInt < 32) {
int value = _inputBuffer[_inputPtr++] & 0xFF;
if (value < 32) {
throw _constructError("Invalid second byte for simple value: 0x"
+Integer.toHexString(_numberInt)+" (only values 0x20 - 0xFF allowed)");
+Integer.toHexString(value)+" (only values 0x20 - 0xFF allowed)");
}

if (simpleAsEmbedded) {
_simpleValue = new CBORSimpleValue(value);
} else {
_numberInt = value;
}
}

// 25-Nov-2020, tatu: Although ideally we should report these
// as `JsonToken.VALUE_EMBEDDED_OBJECT`, due to late addition
// of handling in 2.12, simple value in 2.12 will be reported
// as simple ints.
if (simpleAsEmbedded) {
return JsonToken.VALUE_EMBEDDED_OBJECT;
}

_numTypesValid = NR_INT;
return (JsonToken.VALUE_NUMBER_INT);
return JsonToken.VALUE_NUMBER_INT;
}

/*
Expand Down Expand Up @@ -4101,4 +4138,11 @@ private void createChildObjectContext(final int len) throws IOException {
_streamReadContext = _streamReadContext.createChildObjectContext(len);
_streamReadConstraints.validateNestingDepth(_streamReadContext.getNestingDepth());
}

// @since 2.20
private void clearRetainedValues() {
_numTypesValid = NR_UNKNOWN;
_binaryValue = null;
_simpleValue = null;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,23 @@ public void testTinySimpleValues() throws Exception
}
}

@Test
public void testTinySimpleValuesAsEmbeddedObjectWhenEnabled() throws Exception
{
CBORFactory f = CBORFactory.builder()
.enable(CBORParser.Feature.READ_SIMPLE_VALUE_AS_EMBEDDED_OBJECT)
.build();
// Values 0..19 are unassigned, valid to encounter
for (int v = 0; v <= 19; ++v) {
byte[] doc = new byte[1];
doc[0] = (byte) (CBORConstants.PREFIX_TYPE_MISC + v);
try (CBORParser p = cborParser(f, doc)) {
assertToken(JsonToken.VALUE_EMBEDDED_OBJECT, p.nextToken());
assertEquals(new CBORSimpleValue(v), p.getEmbeddedObject());
}
}
}

@Test
public void testValidByteLengthMinimalValues() throws Exception {
// Values 32..255 are unassigned, valid to encounter
Expand All @@ -44,6 +61,21 @@ public void testValidByteLengthMinimalValues() throws Exception {
}
}

@Test
public void testValidByteLengthMinimalValuesAsEmbeddedObjectWhenEnabled() throws Exception {
// Values 32..255 are unassigned, valid to encounter
CBORFactory f = CBORFactory.builder()
.enable(CBORParser.Feature.READ_SIMPLE_VALUE_AS_EMBEDDED_OBJECT)
.build();
for (int v = 32; v <= 255; ++v) {
byte[] doc = { (byte) (CBORConstants.PREFIX_TYPE_MISC + 24), (byte) v };
try (CBORParser p = cborParser(f, doc)) {
assertToken(JsonToken.VALUE_EMBEDDED_OBJECT, p.nextToken());
assertEquals(new CBORSimpleValue(v), p.getEmbeddedObject());
}
}
}

@Test
public void testInvalidByteLengthMinimalValues() throws Exception {
// Values 0..31 are invalid for variant that takes 2 bytes...
Expand Down
4 changes: 4 additions & 0 deletions release-notes/CREDITS-2.x
Original file line number Diff line number Diff line change
Expand Up @@ -397,3 +397,7 @@ Fawzi Essam (@iifawzi)
* Contributed fix for #431: (cbor) Negative `BigInteger` values not encoded/decoded
correctly
(2.20.0)
* Contributed implementation of #587: (cbor) Allow exposing CBOR Simple values as
`JsonToken.VALUE_EMBEDDED_OBJECT` with a feature flag
(2.20.0)

3 changes: 3 additions & 0 deletions release-notes/VERSION-2.x
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ Active maintainers:
#431: (cbor) Negative `BigInteger` values not encoded/decoded correctly
(reported by Brian G)
(fix contributed by Fawzi E)
#587: (cbor) Allow exposing CBOR Simple values as `JsonToken.VALUE_EMBEDDED_OBJECT`
with a feature flag
(implementation contributed by Fawzi E)
- Generate SBOMs [JSTEP-14]

2.19.0 (24-Apr-2025)
Expand Down