[WIP] feat(bigTent slo): Custom Validate function for big tent SLO schema #2016

Leo-DiCara · 2025-01-31T22:52:52Z

Please Don't merge this until all work in https://github.com/grafana/slo/issues/2697 is complete. We want to ensure that the SLO Plugin API is merged and running in prod before we merge terraform-provider changes.

This adds a custom validate function for queries that checks to ensure basic form of big SLO JSON is being followed before we allow the user to roundtrip it to the API. This checks for:

Basic JSON structure
refID field
datasource field
uid and type subfields in datasource

Closes https://github.com/grafana/slo/issues/2702

github-actions · 2025-01-31T22:53:05Z

In order to lower resource usage and have a faster runtime, PRs will not run Cloud tests automatically.
To do so, a Grafana Labs employee must trigger the cloud acceptance tests workflow manually.

…rovider-grafana into ld/big_tent_validate

elainevuong · 2025-02-05T20:09:34Z

internal/resources/slo/resource_slo.go

+										Type:             schema.TypeString,
+										Description:      `Metric for total events (denominator)`,
+										Required:         true,
+										ValidateDiagFunc: ValidateBigTent(),


clarification: all of our tests seem to be modifying the freeform query field.
do we also still need to validate the success_metric and total_metric fields on the schema with the ValidateBigTent function? or do we expect that usage of Terraform and Big Tent will always be via the query field?

@elainevuong , we decide a few weeks ago to not support the ratio type, so we don't need to be checking for big-tent JSON in the success_metric or total_metric (generated/models/slo/v0_0/slo_gen.go#L139)

We should assume that all ratio SLOs are still promQL. Is there a place this assumption gets tricky? I know it's kind of weird to treat the freeform as "json or promQL" - but in the MTO code the weirdness didn't bubble up to the level of us modifying the schema to clarify

I did add the Validate function to ALL metrics but only tested it on the query field, since functionally these three string fields should be exactly the same in terms of what we want to verify I didn't think there would be much value to rewriting the same tests checking that field.

If we're not supporting Ratio type for big-tent json, I think we ought to remove the ValidateBigTent() call from the success_metric and total_metric and only keep it on the query

elainevuong · 2025-02-05T20:10:32Z

internal/resources/slo/resource_slo.go

@@ -661,3 +666,79 @@ func apiError(action string, err error) diag.Diagnostics {
 		},
 	}
 }
+
+func ValidateBigTent() schema.SchemaValidateDiagFunc {
+	return func(i interface{}, path cty.Path) diag.Diagnostics {


what's this cty package?

This cty package is a hasicorp provided package for managing config values. cty.Path handles managing paths. It is required to write tests for terraform going forward.

Functionally here it just tells us at what depth our error occurred in, which is helpful since we are diving through JSON.

elainevuong · 2025-02-05T20:44:19Z

internal/resources/slo/resource_slo.go

+				Severity:      diag.Warning,
+				Summary:       "Bad JSON format",
+				Detail:        "If this is a big tent query, this should be valid JSON. If this is a prometheus query, ignore this.",
+				AttributePath: path,


disclaimer - I'm not sure if this possible, and I don't know too much about it.

Basically - we always perform this ValidateBigTent function, which validates the query field.
We first:
1 - check if it's a string
2 - check if it's valid JSON, if it isn't, we return a warning, even if it's a valid Prometheus query. Since it just returns a warning, it's non-blocking for the provider and it'll continue to process. However, this might be a bit off-putting for existing users. They've never gotten these warnings before, but now they get them but it's not really actionable? Might not be a great experience.
3 - we then check for the presence of a refId, datasource, a type, and a uid.

--
Is there a way we can get around with not returning the warning for valid Prometheus queries? I was looking around, and maybe we could incorporate a DiffSuppressFunc?

SchemaDiffSuppressFunc is a function which can be used to determine whether a detected diff on a schema element is "valid" or not, and suppress it from the plan if necessary.

https://pkg.go.dev/github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema#SchemaDiffSuppressFunc

Pretty much - if we identify that it ISN'T JSON; we assume it's a Prom query, and we allow it go pass through validation.

There are some examples if these DiffSuppressFunc within the terraform provider already that you could reference.

What are your thoughts?

We could return with no warnings. My thoughts below:

If you are writing a Big tent query you need to provide valid json. We can't distinguish between is this bad JSON and you are writing a big tent query and is this just a prom query.

Big tent users who see this message have queries that will always fail to validate, which could be confusing if you think you've written proper JSON.

I guess it boils down to: Who is it better to confuse?
Big Tent users who will silently pass a tf plan to fail on tf apply?
-OR-
Existing Prom users who see a new warning and may get spooked?

IMO setting up new SLOs for Big Tent is more painful because of the potentially very complex JSON blobs, so I opted for a warning, but it could be that many more users will never even touch big tent SLOs and therefore more users will be annoyed by unactionable warnings when they edit/create SLOs.

My last thought is I'm not even sure validate runs if a tf plan doesnt generate a diff for that field. This would mean unchanged SLOs wouldn't generate warnings. tf validate would generate these warnings everytime.

I think we should assume that big tent SLOs are the minority and we shouldn't raise warnings about totally valid promQL SLOs.

This does complicate our ability to raise useful tf-plan warnings for invalid json. One idea would be to use a simple strings.Contains("datasource"...) as a way to test whether a given SLO was "trying to include a source-datasource that is required for big-tent and not part of promql...

elainevuong · 2025-02-05T20:47:23Z

I'd like to see if we can remove the warnings for valid Prometheus queries when we validate the query field. It might not be a fantastic customer experience for existing users, to suddenly get warnings from the Provider for queries that they haven't changed.

Perhaps we can investigate if this DiffSuppressFunc is an option to pursue?

elainevuong · 2025-02-06T20:15:33Z

Just want to circle back - I verified that if we use the Terraform Provider as is, it DOES give warnings that show up, even if there isn't a diff in the field.

My last thought is I'm not even sure validate runs if a tf plan doesnt generate a diff for that field. This would mean unchanged SLOs wouldn't generate warnings. tf validate would generate these warnings everytime.

With the way this PR is set up currently, we've see this warning in both Ratio SLOs (due to the ValidateBigTent on the success and total metrics) AND in the Freeform Prom QL SLOs. These warnings appear on both "CREATE" of a new SLO, as well as on "UPDATE" of an existing SLO.

I think we need two changes:

drop the ValidateBigTent() on both the success_metric and total_metric
remove the warning if we can't parse as JSON

screenshots if anyone is interested in viewing what the user experience is like.

CREATE - Ratio SLO

CREATE - Adv Freeform Prom SLO

UPDATE - Adv Freeform Prom SLO

…rovider-grafana into ld/big_tent_validate

feat(bigTent slo): Custom Validate function for big tent SLO schema

97b18b9

Leo-DiCara requested review from a team as code owners January 31, 2025 22:52

Leo-DiCara added 3 commits January 31, 2025 14:53

Merge branch 'main' into ld/big_tent_validate

ab44728

feat(bigTent slo): Tests for custom validate functions

4231bd0

Merge branch 'ld/big_tent_validate' of github.com:grafana/terraform-p…

fa7583a

…rovider-grafana into ld/big_tent_validate

Leo-DiCara changed the title ~~feat(bigTent slo): Custom Validate function for big tent SLO schema~~ [WIP] feat(bigTent slo): Custom Validate function for big tent SLO schema Feb 1, 2025

Leo-DiCara added 7 commits February 3, 2025 09:26

feat(bigTent slo): fmt

f514eb0

feat(bigTent slo): removing derpecated reliance

1efd39b

feat(bigTent slo): fmt

2c6b7c9

feat(bigTent slo): fmt

7da58a2

Merge branch 'main' into ld/big_tent_validate

b16d7d6

feat(bigTent slo): fix overzealous id refactor

9697dc6

Merge branch 'ld/big_tent_validate' of github.com:grafana/terraform-p…

73ebdf2

…rovider-grafana into ld/big_tent_validate

Leo-DiCara requested review from ellisda and cesararmada February 3, 2025 21:39

Leo-DiCara added 2 commits February 4, 2025 07:50

feat(bigTent slo): ignore http redirect for link checker

6b77ed8

Merge branch 'main' into ld/big_tent_validate

964ddd4

Leo-DiCara requested a review from elainevuong February 5, 2025 18:20

elainevuong reviewed Feb 5, 2025

View reviewed changes

Leo-DiCara added 3 commits February 7, 2025 14:05

feat(bigTent slo): Drop warning on Prometheus queries

9dcb2f6

Merge branch 'ld/big_tent_validate' of github.com:grafana/terraform-p…

ffecf7e

…rovider-grafana into ld/big_tent_validate

Merge branch 'main' into ld/big_tent_validate

6df417a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] feat(bigTent slo): Custom Validate function for big tent SLO schema #2016

[WIP] feat(bigTent slo): Custom Validate function for big tent SLO schema #2016

Leo-DiCara commented Jan 31, 2025 •

edited by ellisda

Loading

github-actions bot commented Jan 31, 2025

elainevuong Feb 5, 2025 •

edited

Loading

ellisda Feb 5, 2025

Leo-DiCara Feb 5, 2025

elainevuong Feb 6, 2025 •

edited

Loading

elainevuong Feb 5, 2025

Leo-DiCara Feb 5, 2025

elainevuong Feb 5, 2025 •

edited

Loading

Leo-DiCara Feb 5, 2025

ellisda Feb 6, 2025

elainevuong commented Feb 5, 2025 •

edited

Loading

elainevuong commented Feb 6, 2025

[WIP] feat(bigTent slo): Custom Validate function for big tent SLO schema #2016

Are you sure you want to change the base?

[WIP] feat(bigTent slo): Custom Validate function for big tent SLO schema #2016

Conversation

Leo-DiCara commented Jan 31, 2025 • edited by ellisda Loading

github-actions bot commented Jan 31, 2025

elainevuong Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

ellisda Feb 5, 2025

Choose a reason for hiding this comment

Leo-DiCara Feb 5, 2025

Choose a reason for hiding this comment

elainevuong Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

elainevuong Feb 5, 2025

Choose a reason for hiding this comment

Leo-DiCara Feb 5, 2025

Choose a reason for hiding this comment

elainevuong Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

Leo-DiCara Feb 5, 2025

Choose a reason for hiding this comment

ellisda Feb 6, 2025

Choose a reason for hiding this comment

elainevuong commented Feb 5, 2025 • edited Loading

elainevuong commented Feb 6, 2025

Leo-DiCara commented Jan 31, 2025 •

edited by ellisda

Loading

elainevuong Feb 5, 2025 •

edited

Loading

elainevuong Feb 6, 2025 •

edited

Loading

elainevuong Feb 5, 2025 •

edited

Loading

elainevuong commented Feb 5, 2025 •

edited

Loading