POC: Compare profiles within a series, to find "novelty" #4087

simonswine · 2025-04-08T16:03:33Z

This is a proof-of-concept attempt in grading similarity of profiles in order to detect an anomaly in a particular workload.

For the implementation is used fairly naive approach: Take the top N contributing stack traces or function names (based on the their contribution == self, can be configured using --dimension) and record their proportional sizing.

Then build a novelty score with for each seen profile and try to match it to those proportions, when they are matching over a particular threshold, they get merged. (I used 0.1, aka 10% match)

As sample data I queried various profiles, from Pyroscope, as simulation how one traget would send them to us. (so basically go over 15s results).

For services that are fairly large codebases and very dependent on query load (pyroscope-querier, hosted-grafana), i struggled to find novelty score under over 0.05.

For more stable workloads, like the v1 ingester, I managed to get novelty scores of up to 0.25.

I do think this approach already feels it might get very expensive and it also has quite localised (per series) stateful component to it. I do think this will be costly and its results might not be of the quality that we want to make sampling decisions on.

I think we need to take a different approach if we want to continue with this:

We need investigate an approach, where we don't have to hold individual stacktraces/functions in memory in order to compare them.

Potentially this could be something that could be a great match for using an appropriate model to create embeddings, which then could simpify the data stored per profile to a vector.

This was significantly beyond the time box that I said for this PoC and I think we might need to look at this as part of a later effort (hackathon, implementation phase) with more time.

A good summary of models that can help compare stacktrace similarty (not exactly what we want but, going into the right direction is: https://arxiv.org/pdf/2412.14802

Try no.1 comparing full stacks

simonswine added 2 commits April 8, 2025 17:02

POC: Calculate a novelty score of an existing profile series

7adf976

Try no.1 comparing full stacks

Add the novelty calculation, based on function selfs

368e603

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

POC: Compare profiles within a series, to find "novelty" #4087

POC: Compare profiles within a series, to find "novelty" #4087

Uh oh!

simonswine commented Apr 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

POC: Compare profiles within a series, to find "novelty" #4087

Are you sure you want to change the base?

POC: Compare profiles within a series, to find "novelty" #4087

Uh oh!

Conversation

simonswine commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

simonswine commented Apr 8, 2025 •

edited

Loading