Skip to content

Comments

MOD-14010 support Homogenues array floating point forcing(deserializa…#17

Open
AvivDavid23 wants to merge 22 commits intomasterfrom
MOD-13577-datatypes-json-homogeneous-fp-arrays-declare-fp-type
Open

MOD-14010 support Homogenues array floating point forcing(deserializa…#17
AvivDavid23 wants to merge 22 commits intomasterfrom
MOD-13577-datatypes-json-homogeneous-fp-arrays-declare-fp-type

Conversation

@AvivDavid23
Copy link

@AvivDavid23 AvivDavid23 commented Feb 9, 2026

…tion path only)

  • Add Option to try and enforce specific float type in a value for a homogenues array
  • Add binary encoding and decoding, which will be used by RedisJson(to easily preserve the tag per path)
  • Add tests, fuzz tests

no production-quality Rust CBOR library implements RFC 8746 natively (the only candidate, cbor_enhanced, has been unmaintained since 2020 and lacks F16-LE and BF16 support), so a thin IValue ↔ ciborium::Value conversion layer is still needed

Size comparison (vs JSON baseline)

Document JSON CBOR CBOR+zstd
FP32 array (1000 elements) 19 180 B 4 005 B (−79%) 3 593 B (−81%)
FP64 array (1000 elements) 5 891 B 8 005 B (+36%) 2 359 B (−60%)
Heterogeneous array (1000 nums) 5 891 B 8 005 B (+36%) 2 359 B (−60%)
String-heavy object (50 keys) 2 181 B 2 032 B (−7%) 191 B (−91%)
Mixed object 94 B 59 B (−37%) 68 B (−28%)
Nested FP32 arrays + string 3 110 B 826 B (−73%) 338 B (−89%)
Big mixed JSON (200 records) 77 190 B 66 068 B (−14%) 17 777 B (−77%)
Repeated strings / RED-141886 49 276 B 37 190 B (−25%) 2 418 B (−95%)

Example payloads

1 — FP32 typed array (1 000 elements, stored with FPHA F32 tag)

[0.0, 0.001, 0.002, 0.003, 0.004, ...]   // 1 000 floats total

2 — FP64 typed array (1 000 elements, stored with FPHA F64 tag)

[0.0, 0.001, 0.002, ...]   // 1 000 doubles total

3 — Heterogeneous float array (same data, no FPHA hint)

Same JSON as above; without an FPHA hint the array is stored as ArrayHetero
(each element tagged individually). zstd still achieves the same ratio because
the repeated tag bytes compress well.

4 — String-heavy object (50 keys)

{
  "key_0": "value_0_some_longer_string_here",
  "key_1": "value_1_some_longer_string_here",
  "key_2": "value_2_some_longer_string_here",
  ...   // 50 keys total
}

5 — Small mixed object

{
  "name": "Alice",
  "age": 30,
  "scores": [1, 2, 3, null, true, "bonus"],
  "meta": {"active": true, "level": 42}
}

6 — Nested FP32 arrays + string

{
  "a": [0.0, 0.1, 0.2, ...],   // 100 F32 elements
  "b": [0.0, 0.1, 0.2, ...],   // 100 F32 elements
  "label": "test"
}

7 — Big mixed JSON (200 records, heterogeneous embeddings)

[
  {
    "id": 0, "name": "user_0", "active": true, "score": 0.0,
    "tags": ["alpha", "beta", "gamma"],
    "embedding": [0.0, 0.001, 0.002, ...]   // 32 floats
  },
  // ... 200 records total, repeated schema
]

The repeated key names ("id", "name", "active", "score", "tags", "embedding")
across 200 records are what zstd compresses so aggressively here.

8 — Repeated-string records (500 records, RED-141886 scenario)

[
  {"id": 0, "status": "active",   "region": "us-east-1", "tier": "free",     "owner": "team-a", "count": 0},
  {"id": 1, "status": "inactive", "region": "eu-west-1", "tier": "standard", "owner": "team-b", "count": 10},
  // ... 500 records; status/region/tier/owner values repeat from a small fixed set
]

Note

Medium Risk
Adds a new serialization format and changes core IArray error types/behavior, which could affect downstream APIs and edge cases around numeric range/typed-array handling.

Overview
Adds CBOR serialization for IValue via new cbor module, including optional zstd compression and RFC 8746 typed-array tagging (plus a private BF16 tag) so typed numeric arrays round-trip without losing their element type.

Extends JSON deserialization with an opt-in IValueDeserSeed/FPHAConfig that forces homogeneous float arrays to a chosen FloatType and rejects out-of-range values; IArray gains FloatType, push_with_fp_type, and a new IJsonError to unify allocation + range errors.

CI fuzzing is expanded (longer runtime, OOM/ RSS handling) and new fuzz targets cover CBOR decode and CBOR round-trip, alongside added unit tests for the new behaviors.

Written by Cursor Bugbot for commit 043b76c. This will update automatically on new commits. Configure here.

@AvivDavid23 AvivDavid23 changed the title MOD-13577 support Homogenues array floating point forcing(deserializa… MOD-14010 support Homogenues array floating point forcing(deserializa… Feb 18, 2026
@RedisJSON RedisJSON deleted a comment from cursor bot Feb 22, 2026
@AvivDavid23 AvivDavid23 requested a review from gabsow February 22, 2026 07:17
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

@galcohen-redislabs
Copy link

Did you look at c2pa_cbor?

JsonValue::Float(n) => {
if n.is_finite() {
n.to_string()
} else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why infinite is converted to 0?

}
}
JsonValue::Str(s) => {
format!("\"{}\"", s.replace('\\', "\\\\").replace('"', "\\\""))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use raw string literals when backslashes are involved (instead of the \\, \" etc.; here and elsewhere). It makes the code more readable .

@AvivDavid23
Copy link
Author

Did you look at c2pa_cbor?

@galcohen-redislabs I can try that, although it doesnt have a major release, and very low usage:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants