-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] feat(bigTent slo): Custom Validate function for big tent SLO schema #2016
base: main
Are you sure you want to change the base?
Conversation
In order to lower resource usage and have a faster runtime, PRs will not run Cloud tests automatically. |
…rovider-grafana into ld/big_tent_validate
…rovider-grafana into ld/big_tent_validate
Type: schema.TypeString, | ||
Description: `Metric for total events (denominator)`, | ||
Required: true, | ||
ValidateDiagFunc: ValidateBigTent(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarification: all of our tests seem to be modifying the freeform query
field.
do we also still need to validate the success_metric
and total_metric
fields on the schema with the ValidateBigTent function? or do we expect that usage of Terraform and Big Tent will always be via the query
field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elainevuong , we decide a few weeks ago to not support the ratio type, so we don't need to be checking for big-tent JSON in the success_metric
or total_metric
(generated/models/slo/v0_0/slo_gen.go#L139)
We should assume that all ratio SLOs are still promQL. Is there a place this assumption gets tricky? I know it's kind of weird to treat the freeform as "json or promQL" - but in the MTO code the weirdness didn't bubble up to the level of us modifying the schema to clarify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did add the Validate function to ALL metrics but only tested it on the query field, since functionally these three string fields should be exactly the same in terms of what we want to verify I didn't think there would be much value to rewriting the same tests checking that field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're not supporting Ratio type for big-tent json, I think we ought to remove the ValidateBigTent()
call from the success_metric
and total_metric
and only keep it on the query
@@ -661,3 +666,79 @@ func apiError(action string, err error) diag.Diagnostics { | |||
}, | |||
} | |||
} | |||
|
|||
func ValidateBigTent() schema.SchemaValidateDiagFunc { | |||
return func(i interface{}, path cty.Path) diag.Diagnostics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this cty
package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cty package is a hasicorp provided package for managing config values. cty.Path handles managing paths. It is required to write tests for terraform going forward.
Functionally here it just tells us at what depth our error occurred in, which is helpful since we are diving through JSON.
Severity: diag.Warning, | ||
Summary: "Bad JSON format", | ||
Detail: "If this is a big tent query, this should be valid JSON. If this is a prometheus query, ignore this.", | ||
AttributePath: path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disclaimer - I'm not sure if this possible, and I don't know too much about it.
Basically - we always perform this ValidateBigTent
function, which validates the query
field.
We first:
1 - check if it's a string
2 - check if it's valid JSON, if it isn't, we return a warning, even if it's a valid Prometheus query. Since it just returns a warning, it's non-blocking for the provider and it'll continue to process. However, this might be a bit off-putting for existing users. They've never gotten these warnings before, but now they get them but it's not really actionable? Might not be a great experience.
3 - we then check for the presence of a refId
, datasource
, a type
, and a uid
.
--
Is there a way we can get around with not returning the warning for valid Prometheus queries? I was looking around, and maybe we could incorporate a DiffSuppressFunc?
SchemaDiffSuppressFunc is a function which can be used to determine whether a detected diff on a schema element is "valid" or not, and suppress it from the plan if necessary.
https://pkg.go.dev/github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema#SchemaDiffSuppressFunc
Pretty much - if we identify that it ISN'T JSON; we assume it's a Prom query, and we allow it go pass through validation.
There are some examples if these DiffSuppressFunc within the terraform provider already that you could reference.
What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could return with no warnings. My thoughts below:
If you are writing a Big tent query you need to provide valid json. We can't distinguish between is this bad JSON and you are writing a big tent query and is this just a prom query.
Big tent users who see this message have queries that will always fail to validate, which could be confusing if you think you've written proper JSON.
I guess it boils down to: Who is it better to confuse?
Big Tent users who will silently pass a tf plan to fail on tf apply?
-OR-
Existing Prom users who see a new warning and may get spooked?
IMO setting up new SLOs for Big Tent is more painful because of the potentially very complex JSON blobs, so I opted for a warning, but it could be that many more users will never even touch big tent SLOs and therefore more users will be annoyed by unactionable warnings when they edit/create SLOs.
My last thought is I'm not even sure validate runs if a tf plan doesnt generate a diff for that field. This would mean unchanged SLOs wouldn't generate warnings. tf validate would generate these warnings everytime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should assume that big tent SLOs are the minority and we shouldn't raise warnings about totally valid promQL SLOs.
This does complicate our ability to raise useful tf-plan warnings for invalid json. One idea would be to use a simple strings.Contains("datasource"...)
as a way to test whether a given SLO was "trying to include a source-datasource that is required for big-tent and not part of promql...
I'd like to see if we can remove the warnings for valid Prometheus queries when we validate the Perhaps we can investigate if this |
…rovider-grafana into ld/big_tent_validate
Please Don't merge this until all work in https://github.com/grafana/slo/issues/2697 is complete. We want to ensure that the SLO Plugin API is merged and running in prod before we merge terraform-provider changes.
This adds a custom validate function for queries that checks to ensure basic form of big SLO JSON is being followed before we allow the user to roundtrip it to the API. This checks for:
Closes https://github.com/grafana/slo/issues/2702