-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Sum of Squares calculation #428
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
I'm going to wait to merge this until after we finish getting a release working on otp 26. We've just merged that and will release soon. |
Hello there! Saw this go by and wanted to offer a few considerations:
|
Thank you @binaryseed. I will check in with them. |
Thanks for the feedback @binaryseed ! I'll try to quantify the performance hit in the meanwhile. I'll also try adding the count and sum (maybe mean) in the As for if the A little context on the "why" of having this.. Testing a canary deploy in prod against the current running state. If the reply time of the top N endpoints is 1/X standard deviations more than the existing code average, the deployment is stopped. Hence the use/need of the NRQL Also, I checked some Java apps metrics, and they seems to have this value set to something. and the same deployment checks that fail for Elixir pass for Java. 😄 I guess setting the value to 1 instead of 0 would also do that... but yeah. Trying to do it right. 🤣 |
I finally found some time to benchmark this change.
%{
name: name,
scope: scope,
call_count: call_count,
total_call_time: total_call_time,
total_exclusive_time: total_exclusive_time,
min_call_time: min_call_time,
max_call_time: max_call_time
} = metric I did this run with 1, 10, 100, 1,000 and 10,000 random metric values. It seems this change does add some latency. 4:7~ish. but this code is super fast! I had to add If the latency seems to be within reason, I'll add the match-out to this change to get the faster version.
|
Calculate "Total Sum of Squares" value for Metrics.
This uses Welford's online algorithm. It is relatively accurate when using 64-bit floating point values, only losing accuracy on float rounding. Here, there is a bit more lost since the floating points are limited in size and held as integers. (Still better than 0 imho 😄)
This also remove the
sum_of_squares
field from theNewRelic.Metric
struct since it isn't needed for calculation and doesn't need to have an extraAccess
call on it. Also, counters default to 0, so removed the initial add of 0 to 0.Adding this value to the reported metrics allows the
stddev
function to be used on the reported metrics. This can be helpful when looking at performance regressions, for example.Note: This could add a bit of overhead in the extremely hot function it's in! Understandably, this could have negative effects on the overall performance of the agent and reporting metrics on time.
I also noticed that most of the agents don't have this value set, or only set to 0 or 1 or something, so I imagined it's due to the calculation overhead.
My feelings won't be hurt if this doesn't end up in the default branch 💟, but wanted to put this out there for consideration and discussion.