Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add LLMBackendTrafficPolicy #35

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

wengyao04
Copy link

  • add LLMBackendTrafficPolicy, which controls the flow of traffic to the backend.
  • add ratelimit in LLMBackendTrafficPolicy

@wengyao04 wengyao04 force-pushed the trafficpolicy-ratelimit branch 2 times, most recently from 6b4acf3 to 27094a5 Compare December 6, 2024 18:57
@wengyao04 wengyao04 force-pushed the trafficpolicy-ratelimit branch from 27094a5 to 309eaab Compare December 6, 2024 18:58
Signed-off-by: yweng14 <[email protected]>
@wengyao04 wengyao04 marked this pull request as ready for review December 6, 2024 19:04
Copy link
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api/v1alpha1/api.go Outdated Show resolved Hide resolved
api/v1alpha1/api.go Outdated Show resolved Hide resolved
api/v1alpha1/api.go Outdated Show resolved Hide resolved
api/v1alpha1/api.go Outdated Show resolved Hide resolved
api/v1alpha1/api.go Outdated Show resolved Hide resolved
api/v1alpha1/api.go Show resolved Hide resolved
Comment on lines +129 to +130
// LLMBackendTrafficPolicy controls the flow of traffic to the backend.
type LLMBackendTrafficPolicy struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a bit more documentation here like for example this is used to setup rate limit etc.

@mathetake mathetake requested a review from a team December 6, 2024 19:13
@mathetake mathetake mentioned this pull request Dec 6, 2024
2 tasks
@wengyao04 wengyao04 force-pushed the trafficpolicy-ratelimit branch from 46d1d38 to 3deaf80 Compare December 6, 2024 20:44
}

// LLMPolicyRateLimitHeaderMatch defines the match attributes within the HTTP Headers of the request.
type LLMPolicyRateLimitHeaderMatch struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I would like to reuse EG native type as much as possible

@mathetake
Copy link
Member

sorry the CI has a bug #36 this will fix it 🙏

type LLMTrafficPolicyRateLimitRule struct {
// Headers is a list of request headers to match. Multiple header values are ANDed together,
// meaning, a request MUST match all the specified headers.
// At least one of headers or sourceCIDR condition must be specified.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not matching by sourceCIDR here, we can also document the canonical header such as x-ai-gateway-llm-model-name used to apply the rate limiting.

// BackendRefs lists the LLMBackends that this traffic policy will apply
// The namespace is "local", i.e. the same namespace as the LLMRoute.
//
BackendRef LLMBackendLocalRef `json:"backendRef,omitempty"`
Copy link

@aabchoo aabchoo Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description states "backendrefs lists the llmbackends" which implies that this variable should be updated to:

BackendRefs []LLMBackendLocalRef

Do we want a one (traffic policy) to many (backends) relationship? I think it makes sense to have that in the case where we have very similar models that we want to have the same rules for

@mathetake mathetake requested a review from arkodg December 11, 2024 18:18
@mathetake mathetake marked this pull request as draft December 23, 2024 01:26
@mathetake
Copy link
Member

marked this as a draft since I landed the upstream generic rate limit feature based on the response content in Envoy upstream: envoyproxy/envoy#37548 - and the corresponding feature in EG is being worked on envoyproxy/gateway#4957. Once they are all done, then the token rate limit API is not needed at ai-gateway at all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants