-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One file per process instead of per metric/process for DirectFileStore [DOESNT WORK] #161
base: main
Are you sure you want to change the base?
Conversation
This was a quick experiment on having all metrics for each process on the same file, because it seemed like it would be easy. (just do these things in this commit) However, even though tests pass, this doesn't actually work. See next commit on why. If the change were just this, i'd say we should do this. However, as you'll see in the next commit, it's more involved than that, and i'm not sure it's worth doing, at least not with this approach... I'm just pushing this up so it doesn't get lost.
So, again, I don't propose we actually do this, this is just illustrative of what we need to do, and didn't want it to get lost. The problem with the previous commit is that for each metric, we open a new FileMappedDict. All of them sharing the same file is "fine", actually. The problem comes from the fact that each instance of FileMappedDict has their own version of `@used`. When one metric adds a new entry to the file, it moves its own `@used` pointed, but the other instances don't notice this. As soon as a second instance that had already been loaded adds a new labelset, it corrupts the file. This only shows up when running the benchmarks, with histograms. Counters (in our benchmark) don't reproduce it, because each metric in our benchmark only has one labelset. They all get added initially to the file, and an already loded Dict doesn't add new ones. The reason Histograms reproduce it is that new observations may report a bucket that wasn't seen before, which is effectively a new labelset. The solution is for all MetricStore instances to share one single FileMappedDict. This is not particularly hard, but the current MetricStore is not written in a way that allows that without doing this crap. We should find a better way. I'm just leaving this here as documentation of this problem, for the next brave soul attempting this
2 similar comments
@dmagliola Do you know if there's a working model of this proposed approach somewhere, I'd like to give it a try if possible? We are currently running into this issue due to the use of |
Hi @saviogl, there isn't sadly one. The way to try to solve this is to declare your own class that exposes the FileStore interface, and you'd point to that when declaring metrics. The key thing to keep in mind here is that, while you'll have a lot of MetricStores, they all need to point to the same, shared instance of It's a non-trivial change, but that's basically what needs to happen. The challenge for me is I wanted to offer the option: you can have file per process, or file per process-metric. I'm sure there's a better way i'm missing, but anyway, that's my thinking on this. |
@dmagliola Just trying to understand better whether or not this will actually solve our issue. While the amount of files is a problem to be had (maybe soon), the biggest issue right now is around memory consumption due to the fact we are "constantly" reaping unicorn worker processes for graceful memory management and therefore creating new "metrics" per processes. Moving to "One file per process instead of per metric/process for DirectFileStore" would tame the number of files created but the amount of data (metrics per process) would remain the same "correct"? And therefore we would still have an unbounded growth in regards to metrics -> memory for exporting those? Is my assumption correct? If I'm understanding the current model correctly, the solution for the memory unbounded growth would be to have processes to write the same file/metric with some process concurrent control system in place right? |
If I've understood you right this comment in #214 is the situation running into, and you're generating a tonne of metric series by churning through unicorn workers. Whatever we come up with in there won't help you when it comes to scraping metrics becoming slow because of a large number of files, but might help you reduce the number of metrics (and ultimately timeseries in your Prometheus instance) produced. |
FYI: on GitLab similar implementation is used: https://gitlab.com/gitlab-org/prometheus-client-mmap |
This doesn't work!!
I'm just leaving it here as documentation in case someone tries to do this to solve #143
Read the commit comments for more.
NOTE: I still think this is the right thing to do (one file per process, rather than per metric/process)
It'll have more contention in multi-threaded scenarios, but it'll drastically reduce the number of files
NOTE2: We also need to add a Mutex around that shared Dict, if we're having the shared Dict! The
@lock
inin_process_sync
needs to be shared by allMetricStore
sAnyway, i'm back to thinking we should either make a second
DirectFileStore
that does this (and have two separate files, instead of a param), or considerably refactor the DirectFileStore. My quick stab definitely doesn't work