Feat/511 Implement Data Collection and Visualization for Web3.Storage Measurement Batch#560
Conversation
|
Hi @pyropy can you review this draft and let me know if I’m heading in the right direction? |
|
@pyropy , I am also facing this issue in parallel, where the server runs fine but suddenly outputs an error stating that InfluxDB write failed. I have attached a screenshot below. I created InfluxDB keys from this platform: https://eu-central-1-1.aws.cloud2.influxdata.com/orgs/e0c31560a2dfff87/load-data/tokens. I tried deleting keys and checking RPC nodes but issue persists. |
|
@Winter-Soren could you please link to the issue this addresses both in the PR title and description? |
bajtos
left a comment
There was a problem hiding this comment.
I am confused.
What problems is this pull request trying to solve?
If the goal is to start collecting telemetry about batch sizes, I would expect the pull request to simply add more fields to the existing publish point.
| point.tag('cid', cid.toString()) | ||
| point.tag('round_index', roundIndex.toString()) |
There was a problem hiding this comment.
Storing CIDs and round indexes in tags will cause high cardinality that will degrade performance and/or increase our bill.
Quoting from https://docs.influxdata.com/influxdb/v2/write-data/best-practices/resolve-high-cardinality/
Tags containing highly variable information like unique IDs, hashes, and random strings lead to a large number of series, also known as high series cardinality. High series cardinality is a primary driver of high memory usage for many database workloads.
(...)
Review your tags to ensure each tag does not contain unique values for most entries.
There was a problem hiding this comment.
thanks for the suggestion, I'll rectify the commit!
@juliangruber, I have edited the name of the PR and added a link to the issue it resolves. |
pyropy
left a comment
There was a problem hiding this comment.
Good job and thank you for your contribution @Winter-Soren! 🎉
I think the PR is going in the right direction but would need to fix few things before continuing. Namely lets stick to established pattern in the common/telemetry.js file (which I've noted in the comments). We would also need to improve naming a bit to make it more obvious of what metrics we're storing.
Keep up the good work! 🚀
| logger.log(`Publishing ${measurements.length} measurements. Total unpublished: ${totalCount}. Batch size: ${maxMeasurements}.`) | ||
|
|
||
| // Calculate batch size in bytes | ||
| const batchSizeBytes = Buffer.byteLength( |
There was a problem hiding this comment.
Have you found some other ways to calculate batch size without serialising objects to JSON? Depending on the batch size that might create consume a lot of memory.
There was a problem hiding this comment.
+1
We have the following code few lines below:
const file = new File(
[measurements.map(m => JSON.stringify(m)).join('\n')],
'measurements.ndjson',
{ type: 'application/json' }
)Please refactor it so that we create only one copy of measurements.map(m => JSON.stringify(m)).join('\n').
| point.tag('cid', cid.toString()) | ||
| point.tag('round_index', roundIndex.toString()) |
|
Thank you for adding a link to #511, @Winter-Soren. This pull request makes sense to me now! Thank you for the contribution ❤️ Before we move forward, I think it's best for me and @pyropy to agree on the higher-level design. @pyropy Why is it necessary to create a new bucket and a new telemetry writer? As I wrote before:
If we add more fields to the existing |
Sorry, I have just quickly glanced over your comment. Looking at the code I have to say you're right, we could just simply extend |
|
@Winter-Soren I have published a bad task description without looking deeper into the existing codebase. I have updated the task description according to original comment published by @bajtos. |
| 's' // precision | ||
| ) | ||
|
|
||
| // Add new write client for batch metrics |
There was a problem hiding this comment.
Let's not add new bucket and write client, rather let's just extend existing publish metric.
| setInterval(() => { | ||
| publishWriteClient.flush().catch(console.error) | ||
| networkInfoWriteClient.flush().catch(console.error) | ||
| batchMetricsWriteClient.flush().catch(console.error) |
There was a problem hiding this comment.
| batchMetricsWriteClient.flush().catch(console.error) |
We won't need this one if we extend existing publish metric.
| recordNetworkInfoTelemetry, | ||
| batchMetricsWriteClient |
There was a problem hiding this comment.
| recordNetworkInfoTelemetry, | |
| batchMetricsWriteClient | |
| recordNetworkInfoTelemetry |
We won't need this one if we extend existing publish metric.
|
|
||
| // Enhanced telemetry recording with separate batch metrics | ||
| recordTelemetry('publish', point => { | ||
| // Existing metrics |
There was a problem hiding this comment.
Let's extend this metric with new point that collects batch size.
There was a problem hiding this comment.
Let me correct that - we should add new fields to the existing point.
| }) | ||
|
|
||
| // Separate batch metrics recording for better organization | ||
| recordTelemetry('batch_metrics', point => { |
9bc327d to
3a83c09
Compare

This PR implements collection and visualization of Web3.Storage measurement batch metrics.
Closes #511
Changes
Code Changes
common/telemetry.jsto add dedicated batch metrics write clientpublish/index.jsto separate batch metrics recordingMetrics Collected
batch_size_bytes: Total size of the measurement batchavg_measurement_size_bytes: Average size per measurementmeasurement_count: Number of measurements in batchcid: Content ID of the batchround_index: Associated round numberTesting