Skip to content

Conversation

@roidelapluie
Copy link
Member

No description provided.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I like this, will review in details later on

@roidelapluie
Copy link
Member Author

Here's what I get so far with PromQL:

{
  "status": "success",
  "data": {
    "promql": {
      "at_modifier": true,
      "delayed_name_removal": false,
      "experimental_duration_expr": false,
      "extended_range_selectors": false,
      "negative_offset": true,
      "per_query_lookback_delta": true,
      "per_step_stats": false,
      "subqueries": true,
      "type_and_unit_labels": false
    },
    "promql_functions": {
      "abs": true,
      "absent": true,
      "absent_over_time": true,
      "acos": true,
      "acosh": true,
      "asin": true,
      "asinh": true,
      "atan": true,
      "atanh": true,
      "avg_over_time": true,
      "ceil": true,
      "changes": true,
      "clamp": true,
      "clamp_max": true,
      "clamp_min": true,
      "cos": true,
      "cosh": true,
      "count_over_time": true,
      "day_of_month": true,
      "day_of_week": true,
      "day_of_year": true,
      "days_in_month": true,
      "deg": true,
      "delta": true,
      "deriv": true,
      "double_exponential_smoothing": false,
      "exp": true,
      "first_over_time": false,
      "floor": true,
      "histogram_avg": true,
      "histogram_count": true,
      "histogram_fraction": true,
      "histogram_quantile": true,
      "histogram_stddev": true,
      "histogram_stdvar": true,
      "histogram_sum": true,
      "hour": true,
      "idelta": true,
      "increase": true,
      "info": false,
      "irate": true,
      "label_join": true,
      "label_replace": true,
      "last_over_time": true,
      "ln": true,
      "log10": true,
      "log2": true,
      "mad_over_time": false,
      "max_over_time": true,
      "min_over_time": true,
      "minute": true,
      "month": true,
      "pi": true,
      "predict_linear": true,
      "present_over_time": true,
      "quantile_over_time": true,
      "rad": true,
      "rate": true,
      "resets": true,
      "round": true,
      "scalar": true,
      "sgn": true,
      "sin": true,
      "sinh": true,
      "sort": true,
      "sort_by_label": false,
      "sort_by_label_desc": false,
      "sort_desc": true,
      "sqrt": true,
      "stddev_over_time": true,
      "stdvar_over_time": true,
      "sum_over_time": true,
      "tan": true,
      "tanh": true,
      "time": true,
      "timestamp": true,
      "ts_of_first_over_time": false,
      "ts_of_last_over_time": false,
      "ts_of_max_over_time": false,
      "ts_of_min_over_time": false,
      "vector": true,
      "year": true
    },
    "promql_operators": {
      "!=": true,
      "!~": true,
      "%": true,
      "*": true,
      "+": true,
      "-": true,
      "/": true,
      "<": true,
      "<=": true,
      "==": true,
      "=~": true,
      ">": true,
      ">=": true,
      "@": true,
      "^": true,
      "and": true,
      "atan2": true,
      "avg": true,
      "bottomk": true,
      "count": true,
      "count_values": true,
      "group": true,
      "limit_ratio": false,
      "limitk": false,
      "max": true,
      "min": true,
      "or": true,
      "quantile": true,
      "stddev": true,
      "stdvar": true,
      "sum": true,
      "topk": true,
      "unless": true
    }
  }
}

@bboreham
Copy link
Member

Here's what I get so far with PromQL:

I like the proposal, but I feel that list is too granular. I don't think it's useful to list features and functions which have existed in PromQL for years. So limitk is useful but topk is not.

@roidelapluie
Copy link
Member Author

Here's what I get so far with PromQL:

I like the proposal, but I feel that list is too granular. I don't think it's useful to list features and functions which have existed in PromQL for years. So limitk is useful but topk is not.

To avoid drifting from the actual code, the list is auto-generated. Note that I do not expect users of the API to check if "rate" is supported before using it.

3. New Prometheus-compatible backends require explicit code changes in Grafana, slowing adoption.
4. Type and version checks are coarse; they do not reflect actual enabled features, which may depend on flags or configuration.

There is no real alternative to this for now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what we use in mimir is the buildinfo endpoint by adding a features field to the response.

Grafana already uses that for some alertmanager features.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I added that to the design doc.

- The response follows standard Prometheus API conventions with `status` and `data` fields
- The endpoint returns HTTP 200 OK, like other Prometheus APIs

Some items might exist in multiple categories.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this make it harder for clients to integrate with the endpoint? They'd need to search for the same feature in multiple categories and handle cases where the feature appears in one category but not the other

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that I did not want to restrict some items to some categories. But I can remove this sentence from the design doc, not to encourage duplication of elements.

@roidelapluie roidelapluie force-pushed the roidelapluie/ffapi branch 2 times, most recently from 9ebb830 to 964e418 Compare October 30, 2025 15:50
@roidelapluie
Copy link
Member Author

```json
{
"status": "success",
"data": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the requirement "Allow for potential future inclusion of Alertmanager, even though it is currently out of scope." look like in this API?

I mean I can imagine alertmanager simply returning a similar JSON like this, so I'm not sure what inclusion means?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the requirement. I expect that if Alertmanager introduces this they will follow the same principles naturally.

- `rule` - Rule evaluation features
- `ui` - Web UI capabilities
- `promql` - PromQL language features (syntax, modifiers, operators)
- `promql_functions` - Individual PromQL functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTLP is first class, so I'd already add otlp_receiver

- Clients MUST ignore unknown feature names and categories
- The response follows standard Prometheus API conventions with `status` and `data` fields
- The endpoint returns HTTP 200 OK, like other Prometheus APIs
- Vendors MAY add vendor-specific categories (e.g., `prometheus`, `mimir`, `cortex`) to expose implementation-specific features such as build tags or vendor-unique capabilities
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would a vendor put a feature like a vendor specific PromQL function? Under "promql_functions" or "vendor" ? Probably saying explicitly that vendors MUST put vendor specific flags into their own section would clarify that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated: Vendors MAY add vendor-specific categories (e.g., prometheus, mimir, cortex, or other categories such as clustering) to expose vendor's unique abilities. Vendors implementing custom PromQL functions SHOULD register them under a vendor-specific category (e.g. vendor_functions, metricsql_functions), in case a future prometheus function gets implemented with a different signature.

- Clients MUST ignore unknown feature names and categories
- The response follows standard Prometheus API conventions with `status` and `data` fields
- The endpoint returns HTTP 200 OK, like other Prometheus APIs
- Vendors MAY add vendor-specific categories (e.g., `prometheus`, `mimir`, `cortex`) to expose implementation-specific features such as build tags or vendor-unique capabilities
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you allow sub categories for vendors? I don't know if vendors will have a tonne of features?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated wording: Vendors MAY add vendor-specific categories (e.g., prometheus, mimir, cortex, or other categories such as clustering) to expose vendor's unique abilities. Vendors implementing custom PromQL functions SHOULD register them under a vendor-specific category (e.g. vendor_functions, metricsql_functions), in case a future prometheus function gets implemented with a different signature.

Ideally no subcategories so this is always map[string]map[string]bool

Signed-off-by: Julien Pivotto <[email protected]>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks. LGTM from my side.

For consistency it would be fun to enable some ff to be dynamically settable (via runtime config/HTTP), but that's a separate discussion. I'd love feature flags to be a separate package in Go code so we can consistently used it everywhere too, likely an after effect of the implementation of this PROM-63.

I found categorization very odd at the first glance, similar to @bboreham. Like why we have a mix in this API feature flags, build tags (stringlabels), PromQL functions, PromQL operators and even mode (agent). Then we already have endpoints that give some of this info like build info, flags.

What got me sold is the simplified way of accessing the info of what's the current Prometheus process is "capable of" which is a mix of those flags, Prometheus exact version, the tag it was used with and even build options like what SD providers were build with. Then it's also great for 4.x and 5.x where we might kill some features (improving feature lifecycle maintenance). So this is super useful 👍🏽

}
```

Naming conventions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Naming conventions:
Semantics:

Naming conventions:
- All names MUST use `snake_case`
- Each category value is a map from unique feature name to a boolean
- Clients MUST ignore unknown feature names and categories
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we mention proposed stability of those categories and features? (e.g. can we rename feature in same major version? Can we remove a feature or category?)


Over time, the Prometheus APIs have undergone numerous optimizations, such as supporting POST in addition to GET requests and allowing filtering on certain API endpoints. Additionally, new APIs, PromQL functions, and capabilities are regularly introduced. Some of these features are optional and can be enabled or disabled by users.

Without a "features API," new advancements are often underutilized because API clients are hesitant to adopt them before widespread support exists among users. By creating an API that clearly communicates available and enabled features, clients can take advantage of new capabilities as soon as they are released. For instance, HTTP POST support was added to Prometheus in version 2.1.0 (2018) but was not adopted as the default in Grafana until version 8.0 (2021), illustrating a three-year delay caused by limited visibility of feature availability.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a "features API," new advancements are often underutilized because API clients are hesitant to adopt them before widespread support exists among users. By creating an API that clearly communicates available and enabled features, clients can take advantage of new capabilities as soon as they are released.

Do we have some evidences supporting this statement? I find this hard to believe that this is what blocks user adoption for advanced feature. Also we don't even really measure adoption, but that's a separate problem (:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, HTTP POST support was added to Prometheus in version 2.1.0 (2018) but was not adopted as the default in Grafana until version 8.0 (2021) => that is the evidence :)


The package will be located in the prometheus/prometheus repository.

Instead of actively collecting features from other packages, this package will allow other components to register their supported features with it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽 I'd do this without this proposal to, great to see

@roidelapluie
Copy link
Member Author

This is great, thanks. LGTM from my side.

I found categorization very odd at the first glance, similar to @bboreham. Like why we have a mix in this API feature flags, build tags (stringlabels), PromQL functions, PromQL operators and even mode (agent). Then we already have endpoints that give some of this info like build info, flags.

I can review this and filter down to only have the layer of "useful" things that can be used externally (e.g. remove agent and TSDB features). Keeping PromQL and API-related things.

What got me sold is the simplified way of accessing the info of what's the current Prometheus process is "capable of" which is a mix of those flags, Prometheus exact version, the tag it was used with and even build options like what SD providers were build with. Then it's also great for 4.x and 5.x where we might kill some features (improving feature lifecycle maintenance). So this is super useful 👍🏽

The goal is NOT to expose Prometheus exact version, so that it does not matter if you query Prometheus, Thanos, Cortex, ...

@bwplotka
Copy link
Member

bwplotka commented Nov 4, 2025

I can review this and filter down to only have the layer of "useful" things that can be used externally (e.g. remove agent and TSDB features). Keeping PromQL and API-related things.

Sorry for confusion, I was explaining my train of thoughts 🙈 While initially odd, I think I like the low level details here.

The goal is NOT to expose Prometheus exact version, so that it does not matter if you query Prometheus, Thanos, Cortex, ...

Yup yes! That's why I approved (:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants