armory · AdanUrbanReyesArmory · Dec 6, 2023
@@ -0,0 +1,123 @@
+---
+title: Horizontal Scaling Architecture and Features
+linkTitle: Horizontal Scaling
+description: >
+  Learn how the Horizontal Scaling feature helps by distributing operations across Armory Scale Agent replicas in your Armory Continuous Deployment or Spinnaker environment.
+aliases:
+  - /scale-agent/tasks/horizontal-scaling/
+---
+
+## Overview of Horizontal Scaling
+
+Rather than sending operations to the first Scale Agent instance that could handle it, horizontal Scaling provides a way to improve operations by distributing them across all the Scale Agent replicas that could handle it.
+
+### How to enable and use Horizontal Scaling
+
+First, familiarize yourself with the architecture and features in this guide. Then you can:
+
+1. {{< linkWithTitle "plugins/scale-agent/tasks/horizontal-scaling/operations-enable.md" >}}
+
+## Horizontal Scaling glossary
+
+- **K8s Operation**: an abstraction of a K8s operation; Get, List, Add, Delete, Patch etc.
+- **Dynamic account Operation**: an abstraction of a dynamic account operation; Add or Unregister accounts
+- **Endpoint**: the URL segment after the Clouddriver root
+- **Request**: an instruction that isn’t fulfilled immediately and can have different outcomes; a request can be done through HTTP by the admin or internally by one of the services.
+
+## Architecture
+
+First is important to understand the main difference between K8s operations and Dynamic account operations.
+
+|K8s                                                                                               |Dynamic account                                                    |
+|--------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
+|Are handled by a single Scale Agent Instance                                                      |Could be handled by more than one Scale Agent Instance             |
+|Are processed on every polling cycle; configured by `kubesvc.operations.database.scan` properties |Are processed on demand                                            |
+|Assigning on `clouddriver.kubesvc_operation_single_assign` table                                  |Assigning on `clouddriver.kubesvc_operation_multiple_assign` table |
+
+
+The Scale Agent stores K8s and Dynamic Account operations data in dedicated tables that act like a queue:
+- `clouddriver.kubesvc_operation`: Has the information of new received operations
+- `clouddriver.kubesvc_operation_single_assign`: Has the information of K8s operations that could be assigned just to a single Scale Agent Instance
+- `clouddriver.kubesvc_operation_multiple_assign`: Has the information of dynamic account operations that could be assigned to multiple Scale Agent Instances 
+- `clouddriver.kubesvc_operation_history`: Has the information of K8s and dynamic account operations responses
+
+### K8s Operations
+
+The Scale Agent Plugin creates a job per Scale Agent Instance registration, this job is in charge of:
+1. Fetching pending K8s operations from `clouddriver.kubesvc_operation` table
+2. Assigning pending K8s operations on clouddriver.kubesvc_operation_single_assign table
+3. Fetch assigned K8s operations from `clouddriver.kubesvc_operation_single_assign` table and send it to Scale Agent 
+
+Some important thing to know about it, is that when getting a bad operation response and there is still time to do a retry (based on `kubesvc.cache.operationWaitMs` property), the Scale Agent Plugin does the following:
+The Scale Agent Plugin does: 
+1. Stored the response on `clouddriver.kubesvc_operation_history` table
+2. Unassigns the operation from `clouddriver.kubesvc_operation_single_assign` table, so that another or the same Scale Agent instance can take it again
+
+```mermaid
+C4Deployment
+    title Scale Agent Horizontal Scaling Registration Jobs
+    Boundary(spin, "Armory Continuous Deployment or Spinnaker", "Instance", $borderColor="#0FC2C0") {
+        Boundary(cd, "Clouddriver", "Service", $borderColor="orange") {
+            System(sap, "Scale Agent Plugin<br/>", "For each registration creates a job to assign and send<br/>every N milliseconds the maximum number of K8s operations.<br/><br/>N = kubesvc.operations.database.scan.initialDelay | maxDelay<br/>maximum number = kubesvc.operations.database.scan.batchSize")
+            System(saj0, "Scale Agent Job 0", "")
+            System(saj1, "Scale Agent Job 1", "")
+            System(saj2, "Scale Agent Job 2", "")
+            UpdateElementStyle(saj0, $bgColor="#04AA6D", $borderColor="none")
+            UpdateElementStyle(saj1, $bgColor="#f44336", $borderColor="none")
+            UpdateElementStyle(saj2, $bgColor="#555555", $borderColor="none")
+        }
+        Boundary(sa, "Armory Scale Agent", "Service", $borderColor="purple") {
+            System(sar0, "Replica 0", "")
+            System(sar1, "Replica 1", "")
+            System(sar2, "Replica 2", "")
+            UpdateElementStyle(sar0, $bgColor="#04AA6D", $borderColor="none")
+            UpdateElementStyle(sar1, $bgColor="#f44336", $borderColor="none")
+            UpdateElementStyle(sar2, $bgColor="#555555", $borderColor="none")
+        }
+        Rel(sar0, sap, "Registration", "")
+        UpdateRelStyle(sar0, sap, $textColor="black", $lineColor="#04AA6D")
+        Rel(sar1, sap, "Registration", "")
+        UpdateRelStyle(sar1, sap, $textColor="black", $lineColor="#f44336")
+        Rel(sar2, sap, "Registration", "")
+        UpdateRelStyle(sar2, sap, $textColor="black", $lineColor="#555555")
+        Rel(sap, saj0, "Create")
+        UpdateRelStyle(sap, saj0, $textColor="black", $lineColor="#04AA6D")
+        Rel(sap, saj1, "Create")
+        UpdateRelStyle(sap, saj1, $textColor="black", $lineColor="#f44336", $offsetX="-30", $offsetY="55")
+        Rel(sap, saj2, "Create")
+        UpdateRelStyle(sap, saj2, $textColor="black", $lineColor="#555555", $offsetX="-60", $offsetY="155")
+        BiRel(sar0, saj0, "HandleOp", "request/response")
+        UpdateRelStyle(sar0, saj0, $textColor="black", $lineColor="#04AA6D", $offsetX="-100", $offsetY="30")
+        BiRel(sar1, saj1, "HandleOp", "request/response")
+        UpdateRelStyle(sar1, saj1, $textColor="black", $lineColor="#f44336")
+        BiRel(sar2, saj2, "HandleOp", "request/response")
+        UpdateRelStyle(sar2, saj2, $textColor="black", $lineColor="#555555")        
+    }
+    UpdateLayoutConfig($c4ShapeInRow="1", $c4BoundaryInRow="2")
+```
+
+### Dynamic account Operations
+
+Since dynamic account operations requests are less usual, the Scale Agent Plugin flow is as follows:
+
+1. Receive and store the new dynamic account operation on `clouddriver.kubesvc_operation` table
+2. Assign the dynamic account operation on `clouddriver.kubesvc_operation_multiple_assign` table; it could be assigned to all connected Scale Agent instance or to instances with the recived zoneId
+3. Notify to all instances to fetch pending dynamic account operations from `clouddriver.kubesvc_operation_multiple_assign` table
+4. Each instance reads and sends pending dynamic account operations to Scale Agent
+5. Wait and send the response back
+
+```mermaid
+sequenceDiagram
+    actor User
+    participant Plugin
+    participant Service
+
+    User->>Plugin: Send dynamic account operation
+    Plugin->>Plugin: Store in clouddriver.kubesvc_operation
+    Plugin->>Plugin: Assign on clouddriver.kubesvc_operation_multiple_assign
+    Plugin->>Plugin: Notify all to read and send pending dynamic account operations
+    Plugin->>Service: gRPC HandleOp
+    Service-->>Plugin: return
+    Plugin->>Plugin: Store response in clouddriver.kubesvc_operation_history
+    Plugin-->>User: return
+```
@@ -0,0 +1,51 @@
+---
+title: Enable and Configure Operations Horizontal Scaling in the Armory Scale Agent
+linkTitle: Enable Operations Horizontal Scaling
+description: >
+  Learn how to enable and configure the Operations Horizontal Scaling feature in Armory Scale Agent for Spinnaker and Kubernetes.
+---
+
+## {{% heading "prereq" %}}
+
+* You are familiar with {{< linkWithTitle "plugins/scale-agent/concepts/horizontal-scaling" >}}.
+
+## Scale Agent plugin
+
+> Operations Horizontal Scaling was introduce starting with plugin versions v0.13.20/0.12.21/0.11.56.
+
+You should enable Operations Horizontal Scaling by setting `kubesvc.cluster: database` in your plugin configuration. For example:
+
+{{< highlight bash "linenos=table,hl_lines=27-28">}}
+spec:
+  spinnakerConfig:
+    profiles:
+      clouddriver:
+        spinnaker:
+          extensibility:
+            repositories:
+              armory-agent-k8s-spinplug-releases:
+                enabled: true
+                url: https://raw.githubusercontent.com/armory-io/agent-k8s-spinplug-releases/master/repositories.json
+            plugins:
+              Armory.Kubesvc:
+                enabled: true
+                version: 0.13.20  # Replace with a version compatible with your Armory CD version
+                extensions:
+                  armory.kubesvc:
+                    enabled: true
+        # Plugin config
+        kubesvc:  
+          cluster: database
+         	operations:
+             database:
+              scan:
+                batchSize: <int> # (Optional) # requires kubesvc.cluster: database be enable
+                initialDelay:<int> # (Optional) # requires kubesvc.cluster: database be enable
+                maxDelay:<int> # (Optional) # requires kubesvc.cluster: database be enable
+{{< /highlight >}}
+
+`operations.database.scan`:
+
+* **batchSize**: (Optional) default: 5; The max number of operations that could be assigned to an Scale Agent instance per cycle
+* **initialDelay**: (Optional) default: 250; Milliseconds to wait per cycle, when there are pending operations
+* **maxDelay**: (Optional) default: 2000; Milliseconds to wait per cycle, when there are not pending operations
@@ -8,7 +8,7 @@ Setting|Type|Default|Description
 <code>kubesvc.cache.namespaceExpiryMinutes</code>|integer|0|Disabled by default, set it to a value greater than 0 to enable. Specifies minutes to keep namespace definitions in memory to reduce calls to the database.
 <code>kubesvc.cache.onDemandQuickWaitMs</code>|integer|10000|How long to wait for a recache operation.
 <code>kubesvc.cache.operationWaitMs</code>|integer|30000|How long to wait for a Kubernetes operation like deploy, scale, delete, or others
-<code>kubesvc.cluster</code>|string|none|Type of clustering.<br><code>local</code>: for development only; don’t try to coordinate with other Clouddriver instances<br><code>redis</code>: use Redis to coordinate via pubsub. Redis will be deprecated in a future release.<br><span class='badge badge-primary'>0.10.24+</span><span class='badge badge-primary'>0.9.40</span><span class='badge badge-primary'>0.8.48</span> <code>kubernetes</code>:(Recommended) Requires additional <code>cluster-kubernetes</connected> configuration.
+<code>kubesvc.cluster</code>|string|none|Type of clustering.<br><code>local</code>: for development only; don’t try to coordinate with other Clouddriver instances<br><code>redis</code>: use Redis to coordinate via pubsub. Redis will be deprecated in a future release.<br><span class='badge badge-primary'>0.10.24+</span><span class='badge badge-primary'>0.9.40</span><span class='badge badge-primary'>0.8.48</span> <code>kubernetes</code>:(Recommended) Requires additional <code>cluster-kubernetes</code> configuration.<br><span class='badge badge-primary'>0.13.19+</span><span class='badge badge-primary'>0.12.20+</span><span class='badge badge-primary'>0.11.56+</span> <code>database</code>: Makes database act like a queue to coordinate, improves operations distribution, requires additional <code>operations.database.scan</code> configuration.
 <code>kubesvc.cluster-kubernetes.kubeconfigFile</code><br><code>kubesvc.cluster-kubernetes.verifySsl</code><br><code>kubesvc.cluster-kubernetes.namespace</code><br><code>kubesvc.cluster-kubernetes.httpPortName</code><br><code>kubesvc.cluster-kubernetes.clouddriverServiceNamePrefix</code>|string<br>boolean<br>string<br>string<br>string<br>|null<br>true<br>null<br>http<br>spin-clouddriver|(Optional) If configured, the plugin uses this file to discover Endpoints. If not configured, it will use the service account mounted to the pod.<br>(Optional) Whether to verify the Kubernetes API cert or not.<br>(Optional) If configured, the plugin watches Endpoints in this namespace. If null, it watches endpoints in the namespace indicated in the file <code>/var/run/secrets/kubernetes.io/serviceaccount/namespace</code><br>(Optional) Name of the port configured in clouddriver Service that forwards traffic to clouddriver http port for REST requests.<br>(Optional) Name prefix of the Kubernetes Service pointing to the Clouddriver standard HTTP port.
 <code>kubesvc.credentials.poller.reloadFrequencyMs</code>|long|30000|<span class='badge badge-primary'>2.23.0+</span> <span class='badge badge-primary'>1.23.0+</span> How often the plugin will refresh account credentials to clouddriver in case <code>credentials.poller.enabled</code> is disabled. Otherwise the standard properties of <code>credentials.poller.enabled</code> and <code>credentials.poller.types.kubernetes.reloadFrequencyMs</code> are respected
 <code>kubesvc.disableV2Provider</code>|boolean|false|If you don’t need the V2 provider account, set that to true to speed up caching deserialization.
@@ -41,6 +41,6 @@ Setting|Type|Default|Description
 <code>kubesvc.v2-cache-eviction.batch-size</code>|integer|5|<span class='badge badge-primary'>0.10.3+</span> How many Kubernetes kinds to evict for each eviction event.
 <code>kubesvc.v2-cache-eviction.millis</code>|integer|200|<span class='badge badge-primary'>0.10.3+</span> The time between evictions in milliseconds. Using a low value can lead to a spike in resource usage when migration occurs.
 <code>kubesvc.ops.processTime.metric.result.maxLength</code>|integer|255|How many characters as a maximum could have the <code>kubesvc.ops.processTime.result</code> attribute metric
-
-
-
+<code>kubesvc.operations.database.scan.batchSize</code>|integer|5|The max number of operations that could be assigned to an Scale Agent instance per cycle
+<code>kubesvc.operations.database.scan.initialDelay</code>|integer|250|Milliseconds to wait per cycle, when there are pending operations
+<code>kubesvc.operations.database.scan.maxDelay</code>|integer|2000|Milliseconds to wait per cycle, when there are not pending operations