-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add LLMBackendTrafficPolicy #35
base: main
Are you sure you want to change the base?
Conversation
wengyao04
commented
Dec 6, 2024
- add LLMBackendTrafficPolicy, which controls the flow of traffic to the backend.
- add ratelimit in LLMBackendTrafficPolicy
6b4acf3
to
27094a5
Compare
Signed-off-by: yweng14 <[email protected]>
27094a5
to
309eaab
Compare
Signed-off-by: yweng14 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- do you want to add tests in https://github.com/envoyproxy/ai-gateway/tree/main/tests/cel-validation
- could you edit the PR description so that we can have more context and summary. e.g. how they are used and how do you envision them to be used to add other features. That ways reviewers can get to know the big picture before deciphering the code
// LLMBackendTrafficPolicy controls the flow of traffic to the backend. | ||
type LLMBackendTrafficPolicy struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a bit more documentation here like for example this is used to setup rate limit etc.
Signed-off-by: yweng14 <[email protected]>
46d1d38
to
3deaf80
Compare
} | ||
|
||
// LLMPolicyRateLimitHeaderMatch defines the match attributes within the HTTP Headers of the request. | ||
type LLMPolicyRateLimitHeaderMatch struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse the generic envoy gateway headerMatch type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I would like to reuse EG native type as much as possible
sorry the CI has a bug #36 this will fix it 🙏 |
type LLMTrafficPolicyRateLimitRule struct { | ||
// Headers is a list of request headers to match. Multiple header values are ANDed together, | ||
// meaning, a request MUST match all the specified headers. | ||
// At least one of headers or sourceCIDR condition must be specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not matching by sourceCIDR here, we can also document the canonical header such as x-ai-gateway-llm-model-name
used to apply the rate limiting.
// BackendRefs lists the LLMBackends that this traffic policy will apply | ||
// The namespace is "local", i.e. the same namespace as the LLMRoute. | ||
// | ||
BackendRef LLMBackendLocalRef `json:"backendRef,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description states "backendrefs lists the llmbackends" which implies that this variable should be updated to:
BackendRefs []LLMBackendLocalRef
Do we want a one (traffic policy) to many (backends) relationship? I think it makes sense to have that in the case where we have very similar models that we want to have the same rules for
marked this as a draft since I landed the upstream generic rate limit feature based on the response content in Envoy upstream: envoyproxy/envoy#37548 - and the corresponding feature in EG is being worked on envoyproxy/gateway#4957. Once they are all done, then the token rate limit API is not needed at ai-gateway at all |