Skip to content

EVCacheValue: opt-in compact binary serialization with backwards-compatible reads#196

Open
joegoogle123 wants to merge 1 commit into
sync-getbulk-mixed-keysfrom
evcache-value-binary-serde
Open

EVCacheValue: opt-in compact binary serialization with backwards-compatible reads#196
joegoogle123 wants to merge 1 commit into
sync-getbulk-mixed-keysfrom
evcache-value-binary-serde

Conversation

@joegoogle123

@joegoogle123 joegoogle123 commented Jun 4, 2026

Copy link
Copy Markdown

Summary

Hashed-key values are wrapped in an EVCacheValue envelope (key, value, flags, ttl, createTime) that is currently serialized with Java ObjectOutputStream, adding ~50–80 bytes of structural overhead per item. This adds a compact, length-prefixed binary format for the envelope while remaining fully backwards-compatible on reads.

What changed

  • New EVCacheValueSerde class (com.netflix.evcache.pool) — public-final-non-instantiable codec, owns the wire format and all error handling:
    • static byte[] serialize(EVCacheValue) — length-prefixed binary layout: [magic 0x0C][reserved 0x00][int keyLen][key UTF-8][int valLen][value][int flags][long ttl][long createTime].
    • static EVCacheValue deserialize(byte[]) — bounds-checks length prefixes before allocating; on any corruption / unexpected exception warn-logs the failing field and a truncated hex dump (via Apache Commons Hex.encodeHexString, capped at 1024 bytes) and returns null. Matches BaseSerializingTranscoder's resilience contract (corruption → cache miss → caller refills from source of truth) so a single corrupt entry never crashes a get / getBulk / async pipeline.
    • static boolean isBinaryFormat(byte[]) — exposed for the dispatcher.
  • EVCacheTranscoder becomes a thin dispatcher (no try/catch):
    • serialize: gates on useBinarySerialization && o instanceof EVCacheValueEVCacheValueSerde.serialize; else super.serialize (Java).
    • deserialize: dispatches on EVCacheValueSerde.isBinaryFormatEVCacheValueSerde.deserialize; else super.deserialize.
  • EVCacheValue stays a pure POJO (codec moved out; constructor unchanged from pre-PR).
  • EVCacheImpl reads a Feature Property at client construction and injects it into the (immutable) envelope transcoder.
  • Reads auto-detect format by the leading byte (0xAC 0xED = legacy Java, 0x0C = binary), so a new client decodes existing cache entries unchanged.

Format-flag decision (reuse SERIALIZED + magic byte, not a fresh flag)

The binary envelope keeps the existing SERIALIZED flag and is disambiguated from Java by the leading byte, rather than allocating a new CachedData flag. Rationale:

  • SERIALIZED semantically still means "serialized object → deserialize()"; the codec choice (Java vs binary) lives inside deserialize(). No flag constant is reassigned or repurposed, and decode() branch order is untouched.
  • Consumers that route on SERIALIZED (e.g. the admin inspector, cache-warmer) keep working without a new flag constant to propagate.
  • An old reader that hits binary bytes under SERIALIZED throws StreamCorruptedException (fails loud) rather than silently decoding garbage — which a fresh low-byte flag would cause (decodeString) on old readers.
  • Invariant (documented in EVCacheValueSerde Javadoc): SERIALIZED payloads are self-describing by leading byte; a future third format must use a distinct non-colliding magic + the reserved version byte.

Reserved version byte

Byte index 1 of the binary payload is reserved (always 0x00 today). Reader read-and-ignores; not validated. Reason: forward-compat without an emergency reader rollout. If today's readers rejected any non-zero version, introducing a v2 in the future would require shipping reader support fleet-wide before any writer could emit a v2 byte, and a single misconfigured writer would crash all readers. By accepting any value today, future readers can branch on this byte to introduce breaking format changes backwards-compatibly.

Feature Property (rollout gate)

  • <appName>.envelope.binary.serialization.enabled (global fallback evcache.envelope.binary.serialization.enabled), default false.
  • "Envelope" matches the codebase's existing term for the hashed-key EVCacheValue wrapping (envelopeTranscoder in EVCacheMemcachedClient).
  • Read once at client construction and injected into the immutable transcoder ⇒ deploy/restart required to take effect; this is NOT a live runtime toggle. Flip the property, then redeploy the consuming app.
  • Default-off means production keeps writing Java; reads auto-detect both formats. Roll out reader-first: ship this change everywhere — including the admin inspector and cache-warmer — before enabling the FP for any writer.

Compatibility

  • A client with this change decodes existing Java-serialized values unchanged (dual-format read).
  • With the FP off (default), wire output is byte-identical to today.
  • Corrupt binary payloads degrade to cache miss (null), matching the existing Java path. A single corrupt entry never crashes the caller.

Testing

EVCacheValueSerdeTest17 cases via the public EVCacheTranscoder.encode/decode API:

  • Binary round-trip across edge cases (empty / unicode key / large 2 MB value / negative flags / zero ttl / negative & MAX createTime / MIN flags)
  • Transcoder wire shape (binary flag on): SERIALIZED flag set, leading byte 0x0C, reserved byte 0x00
  • Default transcoder (flag OFF) writes Java (0xAC 0xED) but reads both formats
  • Backwards-compat: legacy ObjectOutputStream-serialized envelope still decodes
  • Non-EVCacheValue passthrough (ArrayList stays on the Java path even with binary flag on)
  • Size win (binary is ~4.2× smaller for typical small items)
  • Malformed binary: truncated, bogus oversize keyLen, negative keyLen all decode to null (logged with field + hex dump)

Full evcache-core suite (./gradlew :evcache-core:test): 28/28 green (EVCacheValueSerdeTest 17, NodeLocatorLookupTest 3, MockEVCacheTest 7, plus runtime tests in other modules).

Chunked-payload integration is not covered by an automated test in this PR — chunking lives in EVCacheClient.createChunks/assembleChunks, which are content-opaque (byte copy + CRC + manifest) and require a live client to exercise. The binary format introduces no new chunking risk by construction: assembleChunks reassembles bytes byte-for-byte and CRC-checks them against the manifest before handing the result to the transcoder.

🤖 Generated with Claude Code

@joegoogle123 joegoogle123 force-pushed the evcache-value-binary-serde branch 15 times, most recently from d443e30 to 3892873 Compare June 4, 2026 22:22
@joegoogle123 joegoogle123 force-pushed the evcache-value-binary-serde branch 10 times, most recently from d46bf97 to 3444f24 Compare June 9, 2026 14:47
@joegoogle123 joegoogle123 requested a review from bihaoxwork June 9, 2026 14:48
Comment on lines +126 to +138
if (keyLength < 0 || keyLength > buffer.remaining()) {
logCorruption(bytes,
"Invalid keyLength: " + keyLength + ", remaining=" + buffer.remaining());
return null;
}
field = "key";
final byte[] keyBytes = new byte[keyLength];
buffer.get(keyBytes);
final String key = new String(keyBytes, StandardCharsets.UTF_8);

field = "valueLength";
final int valueLength = buffer.getInt();
if (valueLength < 0 || valueLength > buffer.remaining()) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we handle the case the keyLength, valueLength are corrupted less? It seems like we will finish with a incorrect data.
Should we check if (buffer.hasRemaining()) return null; after read the buffer?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valueLength corrupted smaller → corrupt data returned as a hit, not null.

The check valueLength > buffer.remaining() still passes, so we read flags/ttl/createTime out of value bytes and return a non-null EVCacheValue with no exception. key is read before valueLength, so it stays intact and the collision check passes — the caller gets corrupt data as a cache hit instead of the documented null.

Example — EVCacheValue(key="ab", value="WXYZ", flags=1, ttl=2, createTime=3):

serialized (36B):
0C 00 | 00 00 00 02 61 62 | 00 00 00 04 57 58 59 5A | 00 00 00 01 | <ttl=2, 8B> | <createTime=3, 8B>
       | keyLen=2  "ab"   | valLen=4    "WXYZ"      |  flags=1     |

flip one byte, valLen 4 -> 0:
0C 00 | 00 00 00 02 61 62 | 00 00 00 00 57 58 59 5A | 00 00 00 01 | ...
                            ^^^^^^^^^^^ valLen=0

deserialize() on the corrupted bytes today returns (no exception):

EVCacheValue{key="ab", value=[], flags=0x5758595A, ttl=4294967296, createTime=8589934592}

key is still "ab" so the collision check passes; value is empty, and the real value bytes WXYZ (0x57 0x58 0x59 0x5A) got reinterpreted as flags. Returned as a hit.

Rejecting leftover bytes after the last field fixes it:

final long createTime = buffer.getLong();
if (buffer.hasRemaining()) return null;   // a corrupted length left bytes unconsumed
return new EVCacheValue(key, valueBytes, flags, ttl, createTime);

On the example this trips hasRemaining() (4 leftover bytes) → null (cache miss). A larger valueLength is already safe (it over-reads → BufferUnderflowException → null); only the smaller case slips through.

@joegoogle123 joegoogle123 Jun 16, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to support adding new fields we will can't do a buffer.hasRemaining() check. Instead I added totalLength field into wire format. Which will let us check that we read all the bytes that are expected to be read. This will be able to guard against corruption if the keyLength or valueLength is wrong.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After trying out the bodyLength change I realized it can't support both adding new optional fields and guarding against bit corruption in keyLength or bodyLength. Given Java object serialization, what we were doing before, suffers from the same issue, I think it's ok to ship this change without addressing the concern around bit corruption.

Comment thread evcache-core/src/main/java/com/netflix/evcache/EVCacheImpl.java Outdated
Introduces a length-prefixed binary envelope for EVCacheValue, which EVCache
wraps around values when the canonical key is hashed (see
EVCacheImpl.getEVCacheKey). Compared to the default Java ObjectOutputStream
encoding it is materially smaller on the wire and avoids the reflective
decode path.

Wire format:
  [magic 0x0C][reserved 0x00]
  [int keyLen][key UTF-8 bytes]
  [int valLen][value bytes]
  [int flags][long ttl][long createTime]
  [...optional extension fields appended by newer writers...]

- Opt-in per app via FastProperty <app>.envelope.binary.serialization.enabled
  (default false). Existing Java-serialized items still decode -- the reader
  is dual-format, so there is no wire break for clusters with in-flight
  cached items.
- Forward-compat for additive optional fields: append at the end, gate with
  buffer.hasRemaining() in the reader, supply a graceful default when absent.
- Breaking changes route through the reserved/version byte at byte 1 with
  reader-before-writer rollout (see class javadoc).
- Bounds-checked length prefixes return null on bogus input, matching
  BaseSerializingTranscoder's resilience contract.

Tests cover binary round-trip across empty/large/unicode/extremes,
dual-format read, transcoder routing, malformed-input handling, and a
pinned v0 byte array trip-wire so future required-field adds can't be
missed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joegoogle123 joegoogle123 force-pushed the evcache-value-binary-serde branch from 75d8249 to 40446ae Compare June 23, 2026 19:57
@joegoogle123 joegoogle123 changed the base branch from master to sync-getbulk-mixed-keys June 23, 2026 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants