You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Allow the sidecar to sample from a list of prefill host ports
In some benchmarking and test environments dynamic prefill selection
may be difficult and random selection among a set of hosts is
sufficient.
Add a new `--enable-prefiller-sampling` flag that instructs the
sidecar to select a random prefill host from the provided list
instead of the first one. Make the behavior opt-in to prevent
users from accidentally depending on the new behavior, and
keep the existing default behavior (first header value) consistent.
E.g.:
curl -H 'x-prefiller-host-port: server1:8000` -H 'x-prefiller-host-port: server2:8000'
will randomly choose one of the two values.
Signed-off-by: Clayton Coleman <[email protected]>
Copy file name to clipboardExpand all lines: cmd/llm-d-routing-sidecar/main.go
+3Lines changed: 3 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -20,6 +20,7 @@ import (
20
20
"flag"
21
21
"net/url"
22
22
"os"
23
+
"strconv"
23
24
24
25
"k8s.io/klog/v2"
25
26
@@ -43,6 +44,7 @@ func main() {
43
44
enableSSRFProtection:=flag.Bool("enable-ssrf-protection", false, "enable SSRF protection using InferencePool allowlisting")
44
45
inferencePoolNamespace:=flag.String("inference-pool-namespace", os.Getenv("INFERENCE_POOL_NAMESPACE"), "the Kubernetes namespace to watch for InferencePool resources (defaults to INFERENCE_POOL_NAMESPACE env var)")
45
46
inferencePoolName:=flag.String("inference-pool-name", os.Getenv("INFERENCE_POOL_NAME"), "the specific InferencePool name to watch (defaults to INFERENCE_POOL_NAME env var)")
47
+
enablePrefillerSampling:=flag.Bool("enable-prefiller-sampling", func() bool { b, _:=strconv.ParseBool(os.Getenv("ENABLE_PREFILLER_SAMPLING")); returnb }(), "if true, the target prefill instance will be selected randomly from among the provided prefill host values")
0 commit comments