Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add LLMBackendTrafficPolicy #35

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ test-cel: envtest apigen format
# To build for multiple platforms, set the GOOS_LIST and GOARCH_LIST variables.
#
# Example:
# - `make build.controler GOOS_LIST="linux darwin" GOARCH_LIST="amd64 arm64"`
# - `make build.controller GOOS_LIST="linux darwin" GOARCH_LIST="amd64 arm64"`
GOOS_LIST ?= $(shell go env GOOS)
GOARCH_LIST ?= $(shell go env GOARCH)
.PHONY: build.%
Expand Down
135 changes: 135 additions & 0 deletions api/v1alpha1/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -123,3 +123,138 @@ const (
// https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Amazon_Bedrock_Runtime.html
APISchemaAWSBedrock APISchema = "AWSBedrock"
)

// +kubebuilder:object:root=true

// LLMBackendTrafficPolicy controls the flow of traffic to the backend.
type LLMBackendTrafficPolicy struct {
Comment on lines +129 to +130
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a bit more documentation here like for example this is used to setup rate limit etc.

metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
// Spec defines the details of the LLMBackend traffic policy.
Spec LLMBackendTrafficPolicySpec `json:"spec,omitempty"`
}

// +kubebuilder:object:root=true

// LLMBackendTrafficPolicyList contains a list of LLMBackendTrafficPolicy
type LLMBackendTrafficPolicyList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []LLMBackendTrafficPolicy `json:"items"`
}

// LLMBackendTrafficPolicySpec defines the details of llm backend traffic policy
// like rateLimit, timeout etc.
type LLMBackendTrafficPolicySpec struct {
// BackendRefs lists the LLMBackends that this traffic policy will apply
// The namespace is "local", i.e. the same namespace as the LLMRoute.
//
BackendRef LLMBackendLocalRef `json:"backendRef,omitempty"`
Copy link

@aabchoo aabchoo Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description states "backendrefs lists the llmbackends" which implies that this variable should be updated to:

BackendRefs []LLMBackendLocalRef

Do we want a one (traffic policy) to many (backends) relationship? I think it makes sense to have that in the case where we have very similar models that we want to have the same rules for

// RateLimit defines the rate limit policy.
RateLimit *LLMTrafficPolicyRateLimit `json:"rateLimit,omitempty"`
}

type LLMTrafficPolicyRateLimit struct {
// Rules defines the rate limit rules.
Rules []LLMTrafficPolicyRateLimitRule `json:"rules,omitempty"`
}

// LLMTrafficPolicyRateLimitRule defines the details of the rate limit policy.
type LLMTrafficPolicyRateLimitRule struct {
// Headers is a list of request headers to match. Multiple header values are ANDed together,
// meaning, a request MUST match all the specified headers.
// At least one of headers or sourceCIDR condition must be specified.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not matching by sourceCIDR here, we can also document the canonical header such as x-ai-gateway-llm-model-name used to apply the rate limiting.

Headers []LLMPolicyRateLimitHeaderMatch `json:"headers,omitempty"`
// +kubebuilder:validation:MinItems=1
Limits []LLMPolicyRateLimitValue `json:"limits"`
}

// LLMPolicyRateLimitHeaderMatch defines the match attributes within the HTTP Headers of the request.
type LLMPolicyRateLimitHeaderMatch struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I would like to reuse EG native type as much as possible

// Type specifies how to match against the value of the header.
Type LLMPolicyRateLimitStringMatchType `json:"type"`
mathetake marked this conversation as resolved.
Show resolved Hide resolved

// Name of the HTTP header.
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=256
Name string `json:"name"`

// Value within the HTTP header. Due to the
// case-insensitivity of header names, "foo" and "Foo" are considered equivalent.
// Do not set this field when Type="Distinct", implying matching on any/all unique
// values within the header.
//
// +optional
// +kubebuilder:validation:MaxLength=1024
Value *string `json:"value,omitempty"`
}

// LLMPolicyRateLimitStringMatchType specifies the semantics of how string values should be compared.
// Valid LLMPolicyRateLimitStringMatchType values are "Exact", "RegularExpression", and "Distinct".
//
// +kubebuilder:validation:Enum=Exact;RegularExpression;Distinct
type LLMPolicyRateLimitStringMatchType string

// HeaderMatchType constants.
const (
// LLMPolicyRateLimitStringMatchHeaderMatchExact matches the exact value of the Value field against the value of
// the specified HTTP Header.
LLMPolicyRateLimitStringMatchHeaderMatchExact LLMPolicyRateLimitStringMatchType = "Exact"
// HeaderMatchRegularExpression matches a regular expression against the value of the
// specified HTTP Header. The regex string must adhere to the syntax documented in
// https://github.com/google/re2/wiki/Syntax.
HeaderMatchRegularExpression LLMPolicyRateLimitStringMatchType = "RegularExpression"
// LLMPolicyRateLimitStringMatchHeaderMatchDistinct matches any and all possible unique values encountered in the
// specified HTTP Header. Note that each unique value will receive its own rate limit
// bucket.
// Note: This is only supported for Global Rate Limits.
LLMPolicyRateLimitStringMatchHeaderMatchDistinct LLMPolicyRateLimitStringMatchType = "Distinct"
)

// LLMPolicyRateLimitValue defines the limits for rate limiting.
type LLMPolicyRateLimitValue struct {
// Type specifies the type of rate limit.
//
// +kubebuilder:default=Token
Type LLMPolicyRateLimitType `json:"type,omitempty"`
// Quantity specifies the number of requests or tokens allowed in the given interval.
Quantity uint `json:"quantity"`
// Unit specifies the interval for the rate limit.
//
// +kubebuilder:default=Minute
Unit LLMPolicyRateLimitUnit `json:"unit,omitempty"`
}

// LLMPolicyRateLimitType specifies the type of rate limit.
// Valid RateLimitType values are "Request" and "Token".
//
// +kubebuilder:validation:Enum=Request;Token
type LLMPolicyRateLimitType string

const (
// LLMPolicyRateLimitTypeRequest specifies the rate limit to be based on the number of requests.
LLMPolicyRateLimitTypeRequest LLMPolicyRateLimitType = "Request"
// LLMPolicyRateLimitTypeToken specifies the rate limit to be based on the number of tokens.
LLMPolicyRateLimitTypeToken LLMPolicyRateLimitType = "Token"
)

// LLMPolicyRateLimitUnit specifies the intervals for setting rate limits.
// Valid RateLimitUnit values are "Second", "Minute", "Hour", and "Day".
//
// +kubebuilder:validation:Enum=Second;Minute;Hour;Day
type LLMPolicyRateLimitUnit string

// RateLimitUnit constants.
const (
// LLMPolicyRateLimitUnitSecond specifies the rate limit interval to be 1 second.
LLMPolicyRateLimitUnitSecond LLMPolicyRateLimitUnit = "Second"

// LLMPolicyRateLimitUnitMinute specifies the rate limit interval to be 1 minute.
LLMPolicyRateLimitUnitMinute LLMPolicyRateLimitUnit = "Minute"

// LLMPolicyRateLimitUnitHour specifies the rate limit interval to be 1 hour.
LLMPolicyRateLimitUnitHour LLMPolicyRateLimitUnit = "Hour"

// LLMPolicyRateLimitUnitDay specifies the rate limit interval to be 1 day.
LLMPolicyRateLimitUnitDay LLMPolicyRateLimitUnit = "Day"
)
1 change: 1 addition & 0 deletions api/v1alpha1/registry.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
func init() {
SchemeBuilder.Register(&LLMRoute{}, &LLMRouteList{})
SchemeBuilder.Register(&LLMBackend{}, &LLMBackendList{})
SchemeBuilder.Register(&LLMBackendTrafficPolicy{}, &LLMBackendTrafficPolicyList{})
}

const GroupName = "aigateway.envoyproxy.io"
Expand Down
163 changes: 163 additions & 0 deletions api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading