Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OM 2.0: Consider using complex values instead of suffixes #283

Open
dashpole opened this issue Dec 2, 2024 · 9 comments
Open

OM 2.0: Consider using complex values instead of suffixes #283

dashpole opened this issue Dec 2, 2024 · 9 comments

Comments

@dashpole
Copy link
Contributor

dashpole commented Dec 2, 2024

This deserves its own proposal, but i'll outline the broad idea here start the discussion and gather high-level feedback.

Idea

The idea is that we could use complex values for fixed bucket histograms, summaries, and counters, similar to what we plan to do for native histograms. In OM 1.0 and in the Prometheus text format, those types are represented using multiple series with suffixes and special labels (e.g. _bucket suffix, or the le label for histograms). Counters are included here because they have a "total" and a "start time".

Advantages

Disadvantages

  • Breaks existing user expectations and queries. Suffixes are very deeply embedded in the Prometheus ecosystem, and this would be a large change for many users.
  • PromQL queries become more complicated because accessing fields requires functions.
    • E.g. sum(request_duration_seconds_count) -> histogram_count(sum(request_duration_seconds)).
    • This would happen anyways if the user migrates to native histograms.
  • Little benefit for summaries and histograms, as we expect/recommend users adopt native histograms anyways.
  • Is there a text format representation that is readable AND easy to generate AND efficient enough to parse, for such a model?
  • It would make OM 2.0 text significantly different to 1.0 and Prometheus text, so some education and big change in parsers/generators would be needed. Not a blocker, but something to keep in mind as a con.

Alternatives

We could only use a complex value for counters, to support the start time in addition to the value. For fixed-bucket histograms and summaries, keep the existing suffixes and labels, but use the "complex counter" value for cumulative series. For users that have migrated from summaries and fixed-bucket histograms to native histograms, this has most of the advantages of the above, without many of the disadvantages for users using summaries or fixed-bucket histograms.

@ArthurSens
Copy link
Member

With Native Histograms stabilizing, is there a scenario where a user would prefer classic histograms over native histograms? If not, should we invest effort in this? 😅

Solves prometheus/prometheus#6541 by supporting the created timestamp in the value.

We also planned to move _created to something similar to the way we handle metadata. Would that be enough for classic histograms as well?

In our first call, we mentioned that one of our intentions with 2.0 is not to break the Prometheus user base to make the spec more favorable to other specs, so I'm not sure how we could commit to this one 😬

@bwplotka
Copy link
Member

bwplotka commented Dec 4, 2024

I think this would be great to consider, thanks! Essentially it removes metric family notion. We kind of do similar in protobuf format already.

Also parses could generate non-complex types for classic histograms and summaries if needed.

Two extra downsides to think about:

  • Is there a text format representation that is readable AND easy to generate AND efficient enough to parse, for such a model?
  • It would make OM 2.0 text significantly different to 1.0 and Prometheus text, so some education and big change in parsers/generators would be needed. Not a blocker, but something to keep in mind as a con.

@bwplotka
Copy link
Member

bwplotka commented Dec 4, 2024

With Native Histograms stabilizing, is there a scenario where a user would prefer classic histograms over native histograms? If not, should we invest effort in this? 😅

Yes and we could simply do the NHCB (custom buckets) straight in the text too.

@bwplotka
Copy link
Member

bwplotka commented Dec 4, 2024

In our first call, we mentioned that one of our intentions with 2.0 is not to break the Prometheus user base to make the spec more favorable to other specs, so I'm not sure how we could commit to this one 😬

How it is breaking? While it would cause parser redesign, you can represent the current Prometheus model just fine with this idea, no?

@dashpole
Copy link
Contributor Author

dashpole commented Dec 4, 2024

Added to the list of cons.

@dashpole
Copy link
Contributor Author

dashpole commented Dec 5, 2024

Unsurprisingly, i'm not the first one to suggest this. @bwplotka pointed me to https://github.com/prometheus/OpenMetrics/blob/main/legacy/markdown/protobuf_vs_text.md#implied-data-model, by @beorn7 which contains a much more thorough description of the tradeoffs involved with using single-line representations of complex types.

@dashpole
Copy link
Contributor Author

dashpole commented Dec 5, 2024

Regarding PromQL queries become more complicated because accessing fields requires functions., I learned that this migration may be happening independent of this proposal based on https://github.com/prometheus/proposals/blob/main/proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md#reading-via-promql.

@bwplotka
Copy link
Member

Relevant thoughts from @beorn7 on the should PromQL use dots thread around complex types.

  1. Prometheus has been plagued by magic suffixes from the very
    beginning. In my understanding of Prometheus history, suffixing
    components of a summary or a histogram with _count, _sum,
    _bucket was a means to get an MVP running. I think it was a
    mistake to reify this concept (originally constrained to TSDB and
    PromQL) by letting it leak into the exposition format. OpenMetrics
    made things worse by introducing more magic suffixes (_info got
    introduced, and _total got a promotion from a mere recommendation
    to another magic suffix, and arguably the same happened to the
    unit). The problem with magic suffixes is namespace pollution: It
    prevents usage of any of the magic suffixes in metric names. Or to
    be precise: It's even worse, the usage is not really technically
    forbidde. You can still do it, but then you might run into
    surprising and confusing namespace collisions that might very well
    show up in the worst of moments. A way out of this is to use a
    separator character for the magic suffixes that is not a valid
    character in names otherwise. And now guess which character comes
    to mind for that... [This is another point in the category "future
    feature has a hard time blocking a feature proposal for the
    present". This was concretely considered when OpenMetrics was
    designed, but it got rejected by the OpenMetrics team. So there is
    a non-zero chance that it will be on the table again when we try to
    "fix" OpenMetrics. A counter point is again that another character
    could be used, at the price of using something that is much less
    intuitive to learn and read.]
  1. Native histograms introduced the first instance of a "structured
    metric", but more could happen in the future. Accessing "fields" in
    this structure is currently done by bespoke functions in PromQL
    (histogram_count(request_latency_seconds),
    histogram_sum(request_latency_seconds)), but it would be a quite
    obvious alternative to allow something like
    request_latency_seconds.count and request_latency_seconds.sum,
    which is actually more than just syntactic sugar because we could
    implement "field access" as a different thing from "function
    call". The latter is an evaluation, it changes the timestamp, and
    it cannot be used in a range selector
    (histogram_count(request_latency_seconds)[5m] is invalid syntax),
    while we could make request_latency_seconds.count[5m] valid
    without changing the language fundamentals. [This is much more
    tangible than (4) and (5), but not decided yet, so it still asks us
    to reject a concrete feature request in the name of a possible
    future feature. And again, there are other ways of implementing
    this, avoiding the usage of a . operator, at the price of not
    doing the most obvious.]

@bwplotka
Copy link
Member

bwplotka commented Jan 15, 2025

Just adding some more arguments for this. Parsing text is getting quite problematic if you try to support all cases (discussion)

For example parsing (broken?) OM text like this:

# TYPE test_metric3 gauge
# HELP test_metric3 help for test_metric3
test_metric3_metric4{foo="bar"} 2

..and trying to notice the test_metric3_metric4 is series violating the spec is quite hard without repeated suffix cutting like in https://github.com/grafana/mimir/blob/4255f00722f0a0e50ca81e50025937a7c9cb692e/pkg/mimirpb/timeseries.go#L217

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

3 participants