Skip to content

Commit d3c3257

Browse files
authored
Merge pull request #91500 from rh-tokeefe/OLS-1622
OLS-1622: Document token quota feature
2 parents ad8bd04 + 8a5e26f commit d3c3257

File tree

3 files changed

+75
-0
lines changed

3 files changed

+75
-0
lines changed

configure/ols-configuring-openshift-lightspeed.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,5 @@ include::modules/ols-about-cluster-interaction.adoc[leveloffset=+1]
2828
include::modules/ols-enabling-cluster-interaction.adoc[leveloffset=+2]
2929
include::modules/ols-about-the-byo-knowledge-tool.adoc[leveloffset=+1]
3030
include::modules/ols-providing-custom-knowledge-to-the-llm.adoc[leveloffset=+2]
31+
include::modules/ols-tokens-and-token-quota-limits.adoc[leveloffset=+1]
32+
include::modules/ols-activating-token-quota-limits.adoc[leveloffset=+2]
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
// Module included in the following assemblies:
2+
// * lightspeed-docs-main/configure/ols-configuring-openshift-lightspeed.adoc
3+
4+
:_mod-docs-content-type: PROCEDURE
5+
[id="ols-activating-token-quota-limits_{context}"]
6+
= Activating token quota limits
7+
8+
Activate token quota limits for the {ols-long} service by defining key-value pairs in the `ConfigMap` resource. The {ols-long} pod mounts the `ConfigMap` resource as a volume, enabling access to the file stored within it. The `OLSConfig` custom resource (CR) references the `ConfigMap` resource to obtain the quota limit information.
9+
10+
.Prerequisites
11+
12+
* You have installed the the {ols-long} Operator.
13+
14+
* You have configured a large language model provider.
15+
16+
* A PostgreSQL database is configured and the {ols-long} service can access the database.
17+
18+
.Procedure
19+
20+
. Open the {ols-long} `OLSconfig` CR file by running the following command:
21+
+
22+
[source,terminal]
23+
----
24+
$ oc edit olsconfig cluster
25+
----
26+
27+
. Modify the `spec.ols.quotaHandlersConfig` specification to include token quota limit information.
28+
+
29+
.Example {ols-long} `OLSConfig` CR
30+
[source,yaml]
31+
----
32+
apiVersion: ols.openshift.io/v1alpha1
33+
kind: OLSConfig
34+
metadata:
35+
name: cluster
36+
spec:
37+
ols:
38+
quotaHandlersConfig:
39+
limitersConfig:
40+
- name: user_limits # <1>
41+
type: user_limiter
42+
initialQuota: 100000 # <2>
43+
quotaIncrease: 1000 # <3>
44+
period: 30 days
45+
- name: cluster_limits # <4>
46+
type: cluster_limiter
47+
initialQuota: 1000000 # <5>
48+
quotaIncrease: 100000 # <6>
49+
period: 30 days # <7>
50+
----
51+
<1> Specifies the token limit for user account.
52+
<2> Specifies a token quota limit of 100,000 for each user over the time period specified in the `period` field.
53+
<3> Increases the token quota limit for the user by 1,000 at the end of the time period specified in the `period` field.
54+
<4> Specifies the token limit for a cluster.
55+
<5> Specifies a token quota limit of 1,000,000 for each cluster over the time period specified in the `period` field.
56+
<6> Increases the token quota limit for the cluster by 100,000 at the end of the time period specified in the `period` field.
57+
<7> Defines the amount of time that the scheduler waits before the period resets or the quota limit increases.
58+
59+
. Click *Save*.
60+
+
61+
The save operation saves the file and applies the changes to activate the token quota limits.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
// Module included in the following assemblies:
2+
// * lightspeed-docs-main/configure/ols-configuring-openshift-lightspeed.adoc
3+
4+
:_mod-docs-content-type: CONCEPT
5+
[id="ols-tokens-and-token-quota-limits_{context}"]
6+
= Tokens and token quota limits
7+
8+
Tokens are small chunks of text, which can be as small as one character or as large as one word. Tokens are the units of measurement used to quantify the amount of text that the {ols-long} service sends to, or receives from, a large language model (LLM). Every interaction with the service and the LLM is counted in tokens.
9+
10+
Token quota limits define the number of tokens that can be used in a certain timeframe. Implementing token quota limits helps control costs, encourage more efficient use of queries, and regulate system demands. In a multi-user configuration, token quota limits help provide equal access to all users ensuring everyone has an opportunity to submit queries.
11+
12+
You can define token quota limits for {ocp-short-name} clusters or {ocp-short-name} user accounts.

0 commit comments

Comments
 (0)