Skip to content

Commit 92e3c78

Browse files
Improves trained model autoscaling docs. (#2857) (#2858)
(cherry picked from commit 4e34d07) Co-authored-by: István Zoltán Szabó <[email protected]>
1 parent f7433a5 commit 92e3c78

File tree

2 files changed

+13
-10
lines changed

2 files changed

+13
-10
lines changed

docs/en/stack/ml/nlp/ml-nlp-autoscaling.asciidoc

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22
= Trained model autoscaling
33

44
You can enable autoscaling for each of your trained model deployments.
5-
Autoscaling allows {es} to automatically adjust the resources the deployment can use based on the workload demand.
5+
Autoscaling allows {es} to automatically adjust the resources the model deployment can use based on the workload demand.
66

77
There are two ways to enable autoscaling:
88

99
* through APIs by enabling adaptive allocations
1010
* in {kib} by enabling adaptive resources
1111

12-
IMPORTANT: To fully leverage model autoscaling, it is highly recommended to enable {cloud}/ec-autoscaling.html[deployment autoscaling].
12+
IMPORTANT: To fully leverage model autoscaling, it is highly recommended to enable {cloud}/ec-autoscaling.html[{es} deployment autoscaling].
1313

1414

1515
[discrete]
@@ -25,6 +25,7 @@ This can help you to manage performance and cost more easily.
2525
When adaptive allocations are enabled, the number of allocations of the model is set automatically based on the current load.
2626
When the load is high, a new model allocation is automatically created.
2727
When the load is low, a model allocation is automatically removed.
28+
You must explicitely set the minimum and maximum number of allocations; autoscaling will occur within these limits.
2829

2930
You can enable adaptive allocations by using:
3031

@@ -35,7 +36,7 @@ If the new allocations fit on the current {ml} nodes, they are immediately start
3536
If more resource capacity is needed for creating new model allocations, then your {ml} node will be scaled up if {ml} autoscaling is enabled to provide enough resources for the new allocation.
3637
The number of model allocations can be scaled down to 0.
3738
They cannot be scaled up to more than 32 allocations, unless you explicitly set the maximum number of allocations to more.
38-
Adaptive allocations must be set up independently for each deployment and {infer} endpoint.
39+
Adaptive allocations must be set up independently for each deployment and {ref}/put-inference-api.html[{infer} endpoint].
3940

4041

4142
[discrete]
@@ -62,7 +63,8 @@ When adaptive resources are enabled, the number of vCPUs that the model deployme
6263
When the load is high, the number of vCPUs that the process can use is automatically increased.
6364
When the load is low, the number of vCPUs that the process can use is automatically decreased.
6465

65-
You can choose from three levels of resource usage for your trained model deployment.
66+
You can choose from three levels of resource usage for your trained model deployment; autoscaling will occur within the selected level's range.
67+
6668
Refer to the tables in the <<auto-scaling-matrix>> section to find out the setings for the level you selected.
6769

6870

@@ -78,13 +80,14 @@ The used resources for trained model deployments depend on three factors:
7880

7981
* your cluster environment (Serverless, Cloud, or on-premises)
8082
* the use case you optimize the model deployment for (ingest or search)
81-
* whether adaptive resources are enabled or disabled (dynamic or static resources)
83+
* whether model autoscaling is enabled with adaptive allocations/resources to have dynamic resources, or disabled for static resources
8284

8385
If you use {es} on-premises, vCPUs level ranges are derived from the `total_ml_processors` and `max_single_ml_node_processors` values.
8486
Use the {ref}/get-ml-info.html[get {ml} info API] to check these values.
8587
The following tables show you the number of allocations, threads, and vCPUs available in Cloud when adaptive resources are enabled or disabled.
8688

87-
NOTE: For Observability and Security projects on Serverless, adaptive allocations are automatically enabled, and the "Adaptive resources" control is not displayed in {kib}.
89+
NOTE: On Serverless, adaptive allocations are automatically enabled for all project types.
90+
However, the "Adaptive resources" control is not displayed in {kib} for Observability and Security projects.
8891

8992

9093
[discrete]

docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -459,10 +459,10 @@ To gain the biggest value out of ELSER trained models, consider to follow this l
459459
* Setting `min_allocations` to `0` can save on costs for non-critical use cases or testing environments.
460460
* Enabling <<ml-nlp-auto-scale,autoscaling>> through adaptive allocations or adaptive resources makes it possible for {es} to scale up or down the available resources of your ELSER deployment based on the load on the process.
461461

462-
* Use two ELSER {infer} endpoints: one optimized for ingest and one optimized for search.
463-
** In {kib}, you can select for which case you want to optimize your ELSER deployment.
464-
** If you use the {infer} API and want to optimize your ELSER endpoint for ingest, set the number of threads to `1` (`"num_threads": 1`).
465-
** If you use the {infer} API and want to optimize your ELSER endpoint for search, set the number of threads to greater than `1`.
462+
* Use dedicated, optimized ELSER {infer} endpoints for ingest and search use cases.
463+
** When deploying a trained model in {kib}, you can select for which case you want to optimize your ELSER deployment.
464+
** If you use the trained model or {infer} APIs and want to optimize your ELSER trained model deployment or {infer} endpoint for ingest, set the number of threads to `1` (`"num_threads": 1`).
465+
** If you use the trained model or {infer} APIs and want to optimize your ELSER trained model deployment or {infer} endpoint for search, set the number of threads to greater than `1`.
466466

467467

468468
[discrete]

0 commit comments

Comments
 (0)