Skip to content

Conversation

@vicsn
Copy link
Collaborator

@vicsn vicsn commented Jul 31, 2025

Motivation

These will be instrumental in improving snarkOS node performance.

Closes #2367

  • Add histogram metrics for database operations with timing distributions
  • Instrument RocksDB and Memory map operations with timing measurements
  • Add map type labels to distinguish different storage maps (block, transaction, etc.)
  • Track cumulative time for read, write, and delete operations
  • Add metrics registration for database histogram metrics
  • Provide inline timing measurements for consistent metric collection

Test Plan

  • Should run a full devnet to assess performance regressions.

Related PRs

ProvableHQ/snarkOS#3769

- Add histogram metrics for database operations with timing distributions
- Instrument RocksDB and Memory map operations with timing measurements
- Add map type labels to distinguish different storage maps (block, transaction, etc.)
- Track cumulative time for read, write, and delete operations
- Add metrics registration for database histogram metrics
- Provide inline timing measurements for consistent metric collection
@raychu86
Copy link
Collaborator

Might be a good opportunity to expand our metrics macro usage as well.

We are currently bound to a single type of histogram and gauge usage, despite the underlying metrics crate being richer in functionality - ProvableHQ/snarkOS#3064.

{
Ok(self.map.read().get(&bincode::serialize(key)?).cloned().map(Cow::Owned))
let serialized_key = bincode::serialize(key)?;
let start = std::time::Instant::now();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: with this many uses I'd import the Instant

false => {
self.map.write().insert(bincode::serialize(&key)?, value);
let serialized_key = bincode::serialize(&key)?;
let start = std::time::Instant::now();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we feature-gate it (metrics)? obtaining the time is a perf penalty in itself

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would work best with a simple macro taking a block containing the section to be measured

pub mod database {
pub const READ_DURATION: &str = "snarkvm_database_read_duration_seconds";
pub const WRITE_DURATION: &str = "snarkvm_database_write_duration_seconds";
pub const DELETE_DURATION: &str = "snarkvm_database_delete_duration_seconds";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the numbers for deletion any different for the "plain" writes? we might just need READ and WRITE

}

/// Times a database operation and records the duration with the given map type label.
pub fn time_database_operation<F, R>(metric_name: &'static str, map_type: &str, operation: F) -> R
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seem like it's used at the moment, but this (or a macro) would be preferable for readability and performance (feature-gating)

false => {
self.map.write().insert(bincode::serialize(&key)?, value);
let serialized_key = bincode::serialize(&key)?;
let start = std::time::Instant::now();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Instant have high precision enough for this? These calls might take less than a millisecond.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but those values are likely to be miniscule when running in memory only

@kaimast
Copy link
Collaborator

kaimast commented Aug 22, 2025

For the block sync work, I have been using tracing directly (see this PR). That might be another option if these calls happen too frequently to report them to Prometheus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Adding DB usage metrics to SnarkVM

5 participants