- 
                Notifications
    You must be signed in to change notification settings 
- Fork 9.9k
Open
prometheus/proposals
#40Description
Context
Recently we experiment or discuss on various improvements & optimizations to WAL/WBL. The NHCB work, CT work, some planned fixes 1 and many optimizations we wanted to try (not writing unchanged samples, store histogram bucketing separately, store multiple similar samples in more efficient way, add details to segments allowing faster replays/sharding etc).
This is all beautiful and amazing, but during the NHCB work we noticed that:
- WAL is not versioned in general. There are some "extensibility" mechanisms record.Type like 247 options left. But we have ideas for changes beyond a single record. Plus it's not really effective to have 10 Histogram records, because we iterated 10 times. Versioning also does not immediately help with rollout/migration of data.
- Even if WAL would be versioned, the migration options, risks and patterns are not well documented. How contributor could optimize/improve WAL record and understand that it might require 2-fold migration rollout? How we can automate this migration or allow users to explicitly ignore migration, because they are using agent or are happy with non-revertability? What if we double write instead of 2-step migration?
- WAL does not have "unknown fields" or schema mechanisms like other protocols have e.g. protobuf capnproto. One exception is Metadata record, which on decoding supports unknown labels. Unknown fields makes no-migration scenarios possible if you only add things to schema. Those also increase overhead of encoding/decoding/storing, but wouldn't that overhead be a good trade-off for the amount of optimizations and saved SWE/Ops time it unlocks?
The main motivation here is the development velocity. We need to be able to experiment with different optimizations and features to effectively maintain Prometheus across old and new use cases.
Proposal
- Add better schema / unknown fields support, perhaps consider https://capnproto.org/ or https://flatbuffers.dev/. We can start slow by experimenting with capnproto on specific records to see the efficiency impact.
- The alternative is some basic sizebased logic (e.g. for every record), to skip certain stuff at the end of the record, but it limits some options e.g. ability to deprecate certain fields in future (maybe fine since we have record type). This requires reinventing the wheel a bit a bit, but maybe is easier to change now and cheaper (although still it will likely require to buffer a lot more when encoding, to know the size).
 
- The alternative is some basic 
- Document WAL migration strategies contributor has to think through when proposing schema changes.
WDYT? Thanks @krajorama @bboreham for the initial discussions around this already!