Skip to content

SMILE format specification - bug in "safe binary" encoding #37

Open
@SheliakLyr

Description

@SheliakLyr

SMILE specification contains the following statements:

  • "Big" decimal/integer values use "safe" binary encoding
  • "Safe" binary encoding simply uses 7 LSB: data is left aligned (i.e. any padding of the last byte is in its rightmost, least-significant, bits).

Let's consider a simple array: Array(0x01). After encoding it to 7LSB it will use 2 bytes. According to the specification "any padding of the last byte is in its rightmost, least-significant, bits", so it should look like this:

_0000000 _1pppppp   (hex: 0x0040)
where:
_ - unused byte (0)
p - padding (0)

However, the jackson library behaves differently. The following code (Scala) encodes a BigInteger (1).

object SmileTestDataGenerator {
  def main(args: Array[String]) {
    val sf = new SmileFactory()
    val os = new ByteArrayOutputStream()
    val gen = sf.createGenerator(os)
    gen.writeNumber(BigInteger.valueOf(1))
    gen.close()
    println(DatatypeConverter.printHexBinary(os.toByteArray))
  }
}

Output:

3A290A0126810001

After removing header/type token/content length we are left with 0001!
I've discovered that padding is located in the last byte but in its LEFT-MOST, most-significant bits. So an array(0x01) is encoded to:
_0000000 pppppp_1

Am I right? Is the specification wrong or is it a bug in implementation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions