Skip to content

Commit

Permalink
Update 2-2 Measuring Performance In Procution.md
Browse files Browse the repository at this point in the history
  • Loading branch information
dendibakh authored Sep 13, 2024
1 parent 4a85ad8 commit ba5338a
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,6 @@ Analyzing production workloads by recreating a specific scenario in a lab can be

It's becoming a trend for large service providers to implement telemetry systems that monitor performance on user devices. One such example is the Netflix Icarus[^1] telemetry service, which runs on thousands of different devices spread around the world. Such a telemetry system helps Netflix understand how users perceive Netflix's app performance. It enables Netflix engineers to analyze data collected from many devices and to find issues that otherwise would be impossible to find. This kind of data enables making better-informed decisions on where to focus optimization efforts.

One important caveat of monitoring production deployments is measurement overhead. Because any kind of monitoring affects the performance of a running service, we recommended to use lightweight profiling methods. According to [@GoogleWideProfiling]: "To conduct continuous profiling on datacenter machines serving real traffic, extremely low overhead is paramount". Usually, acceptable aggregated overhead is considered below 1%. Performance monitoring overhead can be reduced by limiting the set of profiled machines as well as capturing data samples less frequently.
One important caveat of monitoring production deployments is measurement overhead. Because any kind of monitoring affects the performance of a running service, we recommended using lightweight profiling methods. According to [@GoogleWideProfiling]: "To conduct continuous profiling on datacenter machines serving real traffic, extremely low overhead is paramount". Usually, acceptable aggregated overhead is considered below 1%. Performance monitoring overhead can be reduced by limiting the set of profiled machines as well as capturing data samples less frequently.

[^1]: Presented at CMG 2019, [https://www.youtube.com/watch?v=4RG2DUK03_0](https://www.youtube.com/watch?v=4RG2DUK03_0).

0 comments on commit ba5338a

Please sign in to comment.