eval, cli: better support for binary data #483
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ESC currently handles binary data by coercing binary values to a (potentially-invalid) strings. This works okay internally, since Go does not validate the wellformedness of strings when coercing from binary data. However, the standard Go JSON encoder ensures that all strings are coerced to valid UTF-8 during marshaling, which replaces any invalid bytes with the replacement character. This corrupts the data.
For example, the following environment should write the bytes
dcc5 43bc 1b4f f4fb 8263 d630 a199 2688 166b c084 5b71 1240 24c3 a944 d937 e2fcto a temporary file:Instead, it writes the bytes
efbf bdef bfbd 43ef bfbd 1b4f efbf bdef bfbd efbf bd63 efbf bd30 efbf bdef bfbd 26ef bfbd 166b efbf bdef bfbd 5b71 1240 24c3 a944 efbf bd37 efbf bdef bfbd, because the decoded value ofMY_FILEis marshaled as"��C�\u001bO���c�0��\u0026�\u0016k��[q\u0012@$éD�7��"due to the presence of invalid UTF-8 bytes in the decoded value.This commit addresses these challenges through three related changes:
[]bytevalues are now allowed inside ofeval.value. This allows binary data to be faithfully represented within the evaluator.[]bytevalues are now allowed inside ofesc.Value. If anesc.Valuecontains a[]byte, its contents are marshaled as a base64-encoded string and a newbinaryproperty is set totrue. This allows binary data to be safely roundtripped without complicated encoding schemes.filesstanza now supports binary data. If the value of an entry infilesis binary data, it is written to the filesystem as-is.Strictly speaking this is a breaking change: prior to these changes, binary data--which can only be manufactured by a provider, rotator, or
fn::fromBase64--would be marshaled as a plain old JSON string. With these changes, binary data is still marshaled as a JSON string, but its contents are now base64-encoded bytes. If users had binary data that happened to be valid UTF-8, that data will now be encoded using base64 during marshaling instead of being represented verbatim. If this is a major concern, we could change our marshaling behavior to only base64-encode if the binary data is not valid UTF-8.Fixes #480.