Feature/disk usage optimization#523
Conversation
|
Please fix linter errors in your code. You can use |
dacc042 to
9b96d64
Compare
|
Please rebase your branch onto the latest develop and fix the failing CI build: You can track the build status for your changes on your fork page (status of the latest commit): |
a56227d to
a64b4dc
Compare
|
@Tumas Everything looks good now |
|
@Tumas changes resolved |
|
Hi, just checking in on this. Happy to address any feedbacks |
| ) -> RandaoChange { | ||
| let diff_len = (target_slot - base_slot) / P::SlotsPerEpoch::U64; | ||
| let end_idx = (target_slot / P::SlotsPerEpoch::U64) % P::EpochsPerHistoricalVector::U64; | ||
| let start_idx = end_idx - diff_len - 1; |
There was a problem hiding this comment.
This will panic when calculating delta between genesis state and the next state after. Use u64::saturating_sub
There was a problem hiding this comment.
Actually, it wraps around. My logic doesn't cater for that when creating the delta (error from me). But u64::saturating_sub can't work because it results to 0. So more like,
let start_idx = (end_idx + P::EpochsPerHistoricalVector::U64 - diff_len - 1) % P::EpochsPerHistoricalVector::U64;
and then use mod_index, as opposed to get in creating delta
462a6c8 to
ccfe2ab
Compare
|
|
||
| let diff_len = (target_slot - base_slot) / P::SlotsPerEpoch::U64; | ||
| let end_idx = (target_slot / P::SlotsPerEpoch::U64) % modulus; | ||
| let start_idx = (end_idx + modulus - diff_len - 1) % modulus; |
There was a problem hiding this comment.
This commit seems to have introduced a regressesion. You can find it by adding and running some sanity tests for your code:
//fork_choice_control/src/storage.rs:
#[cfg(feature = "eth2-cache")]
#[test_case(
Config::mainnet().into(),
eth2_cache_utils::mainnet::GENESIS_BEACON_STATE.force(),
eth2_cache_utils::mainnet::BEACON_BLOCKS_UP_TO_SLOT_128.force()
)]
// #[test_case(
// Config::mainnet().into(),
// eth2_cache_utils::mainnet::ALTAIR_BEACON_STATE.force(),
// eth2_cache_utils::mainnet::ALTAIR_BEACON_BLOCKS_FROM_128_SLOTS.force()
// )]
// #[test_case(
// Config::mainnet().into(),
// eth2_cache_utils::mainnet::CAPELLA_BEACON_STATE.force(),
// eth2_cache_utils::mainnet::CAPELLA_BEACON_BLOCKS_FROM_244816_SLOTS.force()
// )]
fn test_deltas_roundtrip<P: Preset>(
config: Arc<Config>,
base_state: &Arc<BeaconState<P>>,
blocks: &[Arc<SignedBeaconBlock<P>>],
) -> Result<()> {
let pubkey_cache = Arc::new(PubkeyCache::default());
let mut state = base_state.clone_arc();
let state = state.make_mut();
for block in blocks.iter().skip(1) {
combined::untrusted_state_transition(&config, &pubkey_cache, state, block)?;
let delta = delta(&base_state, &Arc::new(state.clone()))?;
let recovered = apply_delta(&base_state, delta)?;
assert_eq!(*state, recovered);
}
Ok(())
}
To run these tests modify make tests command in the Makefile:
cargo test --release --features stub-grandine-version $(FEATURES) --features eth2-cache $(EXCLUDES) -p fork_choice_control delta -- --nocapture
and clone the https://github.com/grandinetech/eth2-cache to your local Grandine's project root
Store full state every 256 epochs, and delta every 32 epochs. Add base state cache for fast state transitions
-intuitive wordings
ccfe2ab to
bc23e9d
Compare
|
Ping for review when ready |
|
@Tumas, I was thinking of using a derive macro. This might make it more maintainable during develop phase of a new fork The way it would work: use delta_derive::DeltaEncode;
#[derive(Clone, Debug, Default, Deserialize, Serialize, Ssz, DeltaEncode)] // Added DeltaEncode
#[derivative(PartialEq, Eq)]
#[serde(bound = "", deny_unknown_fields)]
pub struct BeaconState<P: Preset> {this way, during develop phase, compiler can easily point missing fields. Thoughts? |
|
Hi, right now we are experimenting with an alternative solution that has potential to reduce the disk usage much more. For example instead of repeatedly storing validator indexes (it's append only list), we can extract those validator indexes to an external storage and never store it in states on disk. We will get back to your PR after we finish experiments on our side. BTW, have you tested how much storage space actually your commit saves? |
|
Hi, Regarding the storage savings, I have gathered concrete benchmarks using the project's existing libmdbx storage layer and compression settings. I tested this against Phase Electra states over a 256-epoch window (8 checkpoints). Benchmark Data (Electra Phase):
Comparison over 256 Epochs:
This represents an 80% reduction in disk usage for state storage. While extracting the validator registry (as you suggested) would reduce the Base State size, the Delta approach still provides a massive 12x compression for the high-frequency changes (like balances) that occur every 32 epochs. I believe a hybrid approach, extracting the static validator set and applying my Delta logic to the mutable fields, would yield the maximum possible savings. |
This PR implements delta encoding for beacon state storage, reducing
disk usage by ~80% while significantly improving state I/O performance.