EVCacheValue: opt-in compact binary serialization with backwards-compatible reads#196
EVCacheValue: opt-in compact binary serialization with backwards-compatible reads#196joegoogle123 wants to merge 1 commit into
Conversation
d443e30 to
3892873
Compare
d46bf97 to
3444f24
Compare
| if (keyLength < 0 || keyLength > buffer.remaining()) { | ||
| logCorruption(bytes, | ||
| "Invalid keyLength: " + keyLength + ", remaining=" + buffer.remaining()); | ||
| return null; | ||
| } | ||
| field = "key"; | ||
| final byte[] keyBytes = new byte[keyLength]; | ||
| buffer.get(keyBytes); | ||
| final String key = new String(keyBytes, StandardCharsets.UTF_8); | ||
|
|
||
| field = "valueLength"; | ||
| final int valueLength = buffer.getInt(); | ||
| if (valueLength < 0 || valueLength > buffer.remaining()) { |
There was a problem hiding this comment.
How do we handle the case the keyLength, valueLength are corrupted less? It seems like we will finish with a incorrect data.
Should we check if (buffer.hasRemaining()) return null; after read the buffer?
There was a problem hiding this comment.
valueLength corrupted smaller → corrupt data returned as a hit, not null.
The check valueLength > buffer.remaining() still passes, so we read flags/ttl/createTime out of value bytes and return a non-null EVCacheValue with no exception. key is read before valueLength, so it stays intact and the collision check passes — the caller gets corrupt data as a cache hit instead of the documented null.
Example — EVCacheValue(key="ab", value="WXYZ", flags=1, ttl=2, createTime=3):
serialized (36B):
0C 00 | 00 00 00 02 61 62 | 00 00 00 04 57 58 59 5A | 00 00 00 01 | <ttl=2, 8B> | <createTime=3, 8B>
| keyLen=2 "ab" | valLen=4 "WXYZ" | flags=1 |
flip one byte, valLen 4 -> 0:
0C 00 | 00 00 00 02 61 62 | 00 00 00 00 57 58 59 5A | 00 00 00 01 | ...
^^^^^^^^^^^ valLen=0
deserialize() on the corrupted bytes today returns (no exception):
EVCacheValue{key="ab", value=[], flags=0x5758595A, ttl=4294967296, createTime=8589934592}
key is still "ab" so the collision check passes; value is empty, and the real value bytes WXYZ (0x57 0x58 0x59 0x5A) got reinterpreted as flags. Returned as a hit.
Rejecting leftover bytes after the last field fixes it:
final long createTime = buffer.getLong();
if (buffer.hasRemaining()) return null; // a corrupted length left bytes unconsumed
return new EVCacheValue(key, valueBytes, flags, ttl, createTime);On the example this trips hasRemaining() (4 leftover bytes) → null (cache miss). A larger valueLength is already safe (it over-reads → BufferUnderflowException → null); only the smaller case slips through.
There was a problem hiding this comment.
If we want to support adding new fields we will can't do a buffer.hasRemaining() check. Instead I added totalLength field into wire format. Which will let us check that we read all the bytes that are expected to be read. This will be able to guard against corruption if the keyLength or valueLength is wrong.
There was a problem hiding this comment.
After trying out the bodyLength change I realized it can't support both adding new optional fields and guarding against bit corruption in keyLength or bodyLength. Given Java object serialization, what we were doing before, suffers from the same issue, I think it's ok to ship this change without addressing the concern around bit corruption.
Introduces a length-prefixed binary envelope for EVCacheValue, which EVCache wraps around values when the canonical key is hashed (see EVCacheImpl.getEVCacheKey). Compared to the default Java ObjectOutputStream encoding it is materially smaller on the wire and avoids the reflective decode path. Wire format: [magic 0x0C][reserved 0x00] [int keyLen][key UTF-8 bytes] [int valLen][value bytes] [int flags][long ttl][long createTime] [...optional extension fields appended by newer writers...] - Opt-in per app via FastProperty <app>.envelope.binary.serialization.enabled (default false). Existing Java-serialized items still decode -- the reader is dual-format, so there is no wire break for clusters with in-flight cached items. - Forward-compat for additive optional fields: append at the end, gate with buffer.hasRemaining() in the reader, supply a graceful default when absent. - Breaking changes route through the reserved/version byte at byte 1 with reader-before-writer rollout (see class javadoc). - Bounds-checked length prefixes return null on bogus input, matching BaseSerializingTranscoder's resilience contract. Tests cover binary round-trip across empty/large/unicode/extremes, dual-format read, transcoder routing, malformed-input handling, and a pinned v0 byte array trip-wire so future required-field adds can't be missed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
75d8249 to
40446ae
Compare
Summary
Hashed-key values are wrapped in an
EVCacheValueenvelope(key, value, flags, ttl, createTime)that is currently serialized with JavaObjectOutputStream, adding ~50–80 bytes of structural overhead per item. This adds a compact, length-prefixed binary format for the envelope while remaining fully backwards-compatible on reads.What changed
EVCacheValueSerdeclass (com.netflix.evcache.pool) — public-final-non-instantiable codec, owns the wire format and all error handling:static byte[] serialize(EVCacheValue)— length-prefixed binary layout:[magic 0x0C][reserved 0x00][int keyLen][key UTF-8][int valLen][value][int flags][long ttl][long createTime].static EVCacheValue deserialize(byte[])— bounds-checks length prefixes before allocating; on any corruption / unexpected exception warn-logs the failing field and a truncated hex dump (via Apache CommonsHex.encodeHexString, capped at 1024 bytes) and returnsnull. MatchesBaseSerializingTranscoder's resilience contract (corruption → cache miss → caller refills from source of truth) so a single corrupt entry never crashes a get / getBulk / async pipeline.static boolean isBinaryFormat(byte[])— exposed for the dispatcher.EVCacheTranscoderbecomes a thin dispatcher (no try/catch):serialize: gates onuseBinarySerialization && o instanceof EVCacheValue→EVCacheValueSerde.serialize; elsesuper.serialize(Java).deserialize: dispatches onEVCacheValueSerde.isBinaryFormat→EVCacheValueSerde.deserialize; elsesuper.deserialize.EVCacheValuestays a pure POJO (codec moved out; constructor unchanged from pre-PR).EVCacheImplreads a Feature Property at client construction and injects it into the (immutable) envelope transcoder.0xAC 0xED= legacy Java,0x0C= binary), so a new client decodes existing cache entries unchanged.Format-flag decision (reuse
SERIALIZED+ magic byte, not a fresh flag)The binary envelope keeps the existing
SERIALIZEDflag and is disambiguated from Java by the leading byte, rather than allocating a newCachedDataflag. Rationale:SERIALIZEDsemantically still means "serialized object →deserialize()"; the codec choice (Java vs binary) lives insidedeserialize(). No flag constant is reassigned or repurposed, anddecode()branch order is untouched.SERIALIZED(e.g. the admin inspector, cache-warmer) keep working without a new flag constant to propagate.SERIALIZEDthrowsStreamCorruptedException(fails loud) rather than silently decoding garbage — which a fresh low-byte flag would cause (decodeString) on old readers.EVCacheValueSerdeJavadoc):SERIALIZEDpayloads are self-describing by leading byte; a future third format must use a distinct non-colliding magic + the reserved version byte.Reserved version byte
Byte index 1 of the binary payload is reserved (always
0x00today). Reader read-and-ignores; not validated. Reason: forward-compat without an emergency reader rollout. If today's readers rejected any non-zero version, introducing a v2 in the future would require shipping reader support fleet-wide before any writer could emit a v2 byte, and a single misconfigured writer would crash all readers. By accepting any value today, future readers can branch on this byte to introduce breaking format changes backwards-compatibly.Feature Property (rollout gate)
<appName>.envelope.binary.serialization.enabled(global fallbackevcache.envelope.binary.serialization.enabled), defaultfalse.envelopeTranscoderinEVCacheMemcachedClient).Compatibility
Testing
EVCacheValueSerdeTest— 17 cases via the publicEVCacheTranscoder.encode/decodeAPI:0x0C, reserved byte0x000xAC 0xED) but reads both formatsObjectOutputStream-serialized envelope still decodesEVCacheValuepassthrough (ArrayList stays on the Java path even with binary flag on)Full evcache-core suite (
./gradlew :evcache-core:test): 28/28 green (EVCacheValueSerdeTest 17, NodeLocatorLookupTest 3, MockEVCacheTest 7, plus runtime tests in other modules).Chunked-payload integration is not covered by an automated test in this PR — chunking lives in
EVCacheClient.createChunks/assembleChunks, which are content-opaque (byte copy + CRC + manifest) and require a live client to exercise. The binary format introduces no new chunking risk by construction:assembleChunksreassembles bytes byte-for-byte and CRC-checks them against the manifest before handing the result to the transcoder.🤖 Generated with Claude Code