-
Notifications
You must be signed in to change notification settings - Fork 1.5k
WIP: Add database read/write/delete timing metrics #2839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Conversation
- Add histogram metrics for database operations with timing distributions - Instrument RocksDB and Memory map operations with timing measurements - Add map type labels to distinguish different storage maps (block, transaction, etc.) - Track cumulative time for read, write, and delete operations - Add metrics registration for database histogram metrics - Provide inline timing measurements for consistent metric collection
|
Might be a good opportunity to expand our metrics macro usage as well. We are currently bound to a single type of |
| { | ||
| Ok(self.map.read().get(&bincode::serialize(key)?).cloned().map(Cow::Owned)) | ||
| let serialized_key = bincode::serialize(key)?; | ||
| let start = std::time::Instant::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: with this many uses I'd import the Instant
| false => { | ||
| self.map.write().insert(bincode::serialize(&key)?, value); | ||
| let serialized_key = bincode::serialize(&key)?; | ||
| let start = std::time::Instant::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we feature-gate it (metrics)? obtaining the time is a perf penalty in itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would work best with a simple macro taking a block containing the section to be measured
| pub mod database { | ||
| pub const READ_DURATION: &str = "snarkvm_database_read_duration_seconds"; | ||
| pub const WRITE_DURATION: &str = "snarkvm_database_write_duration_seconds"; | ||
| pub const DELETE_DURATION: &str = "snarkvm_database_delete_duration_seconds"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the numbers for deletion any different for the "plain" writes? we might just need READ and WRITE
| } | ||
|
|
||
| /// Times a database operation and records the duration with the given map type label. | ||
| pub fn time_database_operation<F, R>(metric_name: &'static str, map_type: &str, operation: F) -> R |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't seem like it's used at the moment, but this (or a macro) would be preferable for readability and performance (feature-gating)
| false => { | ||
| self.map.write().insert(bincode::serialize(&key)?, value); | ||
| let serialized_key = bincode::serialize(&key)?; | ||
| let start = std::time::Instant::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Instant have high precision enough for this? These calls might take less than a millisecond.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but those values are likely to be miniscule when running in memory only
|
For the block sync work, I have been using tracing directly (see this PR). That might be another option if these calls happen too frequently to report them to Prometheus. |
Motivation
These will be instrumental in improving snarkOS node performance.
Closes #2367
Test Plan
Related PRs
ProvableHQ/snarkOS#3769