Skip to content

Instance Metrics using OxQL #2654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 82 commits into from
Feb 27, 2025
Merged

Instance Metrics using OxQL #2654

merged 82 commits into from
Feb 27, 2025

Conversation

charliepark
Copy link
Contributor

@charliepark charliepark commented Jan 17, 2025

This PR expands our current disk metrics charts to incorporate other instance metrics, including CPU utilization and networking. It builds on data stored in ClickHouse, accessed via OxQL.

Screenshot 2025-02-27 at 11 26 23 AM

Copy link

vercel bot commented Jan 17, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
console ✅ Ready (Inspect) Visit Preview Feb 27, 2025 10:22pm

@david-crespo
Copy link
Collaborator

david-crespo commented Jan 18, 2025

Once you have something basically working (you might already, I didn’t check) it might be worth looking at getting better mock data to make iteration and testing easier. One way to start would be to take a real response for a decent sized time range and hard code it as a mock response, filtered by the time range extracted from the OxQL query with regexes. It’ll be janky but the goal is to be just complex enough to support testing the UI.

@benjaminleonard
Copy link
Contributor

benjaminleonard commented Jan 20, 2025

Once you have something basically working (you might already, I didn’t check) it might be worth looking at getting better mock data to make iteration and testing easier.

That's probably better, to get something more accurate.

I have a silly alternative that could work. By modifying the generateUtilization function to walk deterministically based on the time. We interpolate between the start and end time (maybe snap to nearest n time) – and use that as the seed for for the randomness, so even as the relative time moves the values stay consistent.

@charliepark
Copy link
Contributor Author

charliepark commented Jan 27, 2025

It's coming along; will push latest shortly. I had been originally trying to map the existing charts, but chatted with @bnaecker on Friday, and he noted that showing the data as it comes through from OxQL will be a bit more useful to people.

Regarding the existing (top) and new (bottom) charts …
Screenshot 2025-01-27 at 11 33 31 AM
Ben noted …

Folks generally want to know "when was my disk busy", or "how many ops were in this period". To answer that in the bottom, you just look at those spikes. To answer that in the top, you have to look at the periods where the slope is high.

Two instances to check out from dogfood to see read data:

Alan noted that Crucible is having some issues writing write data, so the writes charts aren't useful right now.

@david-crespo
Copy link
Collaborator

Looks fantastic. Glad the data is more useful!

@david-crespo david-crespo linked an issue Jan 31, 2025 that may be closed by this pull request
@david-crespo
Copy link
Collaborator

david-crespo commented Feb 27, 2025

2f38adc cuts the biggest piece of the bundle down from 397kb to 338kb!!! (By splitting out MetricsTab)

image

<Route
lazy={() =>
import(
'./pages/project/instances/instance/tabs/MetricsTab/NetworkMetricsTab'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These paths are ridiculous. Once this is merged I'm going to fix them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flattened the directory tree a bit here: #2713

endTime={endTime}
unit={unit !== 'count' ? unit : undefined}
/>
</div>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got rid of the Suspense here in 2f38adc because we're relying on RR's lazy loading instead of doing it manually.

@david-crespo david-crespo marked this pull request as ready for review February 27, 2025 17:55
Copy link
Collaborator

@david-crespo david-crespo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's goooooo

@david-crespo david-crespo enabled auto-merge (squash) February 27, 2025 22:21
@david-crespo david-crespo merged commit 3263678 into main Feb 27, 2025
7 checks passed
@david-crespo david-crespo deleted the oxql_disk_metrics branch February 27, 2025 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instance CPU metrics
4 participants