-
Notifications
You must be signed in to change notification settings - Fork 19
Shared keys can cause unescaped write of BYTE_MARKER_END_OF_CONTENT #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for reporting this; -1 (i.e. |
Also: I am guessing that since that byte acts as an end-of-input, it may also avoid getting caught by unit tests... nasty. I will need to see if this also occurs with 2.3; if so, obviously need to backport fix for 2.3.4 as well. |
Thx for looking into it. I actually run into it on the 2.3 branch On Tue, Jun 17, 2014 at 6:06 PM, Tatu Saloranta [email protected]
|
@bleskes Ok. I am guessing it probably exists in 2.4, but we'll see. |
Hmmh. I could not reproduce the issue immediately. But I have one thought on problem -- since -1 is indeed the end-marker, is it possible that Test I tried actually uses |
.... ok. Interesting; yes, I can actually see 0xFF there in the middle. So I can reproduce it, although parser had no trouble with it. Fascinating... |
Ouch. This just came from unfortunate to really, really bad. I need to triple-check, but it is possible this would be a flaw in specification itself, if it does not reserve value. While it will be possible for parser to be forgiving, it will be tricky to figure out a way to have seamless support between old and new implementations. |
So: basically format would need to reserve byte range also for long back-references. Problem here is that while new parser can accept old values (for backwards-compatibility), and we also have version number of format to use (which we probably have to now start use), it will require careful coreography to make things work in least intrusive manner. It is possible that there are some combinations that won't work -- new encoder, old decoder, most likely -- but what I do not want are mysterious failures half-way through processing. |
@bleskes: Actually I think I figured out a simple(r) short-term solution for encoding problem. It will not resolve the problem of existing data that may have unexpected markers, but will stop generation of such data, without requiring any changes to decoders. So: the change needed is to add special handling for cases where invalid byte markers would be produced as back-reference. We can not omit production of back-referenced values (both because they are actually needed as first-time values; and to be able to skip "invalid" index itself); but we can omit adding value into back-referenceable lookup map. Given this, such back-reference will never be used, and no invalid byte value will be used. Downside of such a change is that values added into these slots will end up being duplicated, even where theoretically we could use a back-reference. But this is a minor sub-optimality; and in fact encoders are not required to have 100% reliable detection of back-referenceable keys or values. Whether there should be a change to specification itself is an open question; the first step could be to document the need for this work-around for generator. It should then be possible to define "alternate" codes to use to refer to blocked values (ones that can not be used because they would result in use of invalid byte), as sort of alias. But doing that will require update of version value, and bit of future-proofing with codec. |
Nice! I think it's a very good pragmatic solution. I agree it's not "clean" from a spec perspective but it's effective and simple |
Yes. I need to make sure update specification to mention this issue, and this work-around, so that other codecs can handle it too. |
This fixes issue 18 in Smile (FasterXML/jackson-dataformat-smile#18) closes elastic#7327
This fixes issue 18 in Smile (FasterXML/jackson-dataformat-smile#18) closes #7327
This fixes issue 18 in Smile (FasterXML/jackson-dataformat-smile#18) closes #7327
This fixes issue 18 in Smile (FasterXML/jackson-dataformat-smile#18) closes #7327
If you write >256 shared keys, the output stream contains -1 byte even though it's not the end of the document. This is a problem for us as we rely on the -1 marker to separate docs from a binary stream without the need to parse them.
Here is a failing junit reproduction:
I believe the problem lies in this line, which writes an un escaped byte as the second parameter:
jackson-dataformat-smile/src/main/java/com/fasterxml/jackson/dataformat/smile/SmileGenerator.java
Line 820 in 284cca9
The text was updated successfully, but these errors were encountered: