Auto scaling
Auto scaling is implemented as a kubernetes concept called Horizontal Pod Autoscaler. The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics). GAP currently supports auto scaling based on pod CPU utilization. It’s possible to set it up to be based on a custom metric, however you need to implement that manually.
The Horizontal Pod Autoscaler is implemented as a Kubernetes API resource and a controller. The resource determines the behavior of the controller. The controller periodically adjusts the number of replicas in a replication controller or deployment to match the observed average CPU utilization to the target specified by user.
To understand the inner workings of the Horizontal Pod Autoscaler, please read the official documentation.
Please note that autoscaling is disabled by default in GAP. You have to manually enable it using the autoscaling object in your gap.yaml to utilize it.
You can define separate autoscaling behaviour for each of your deployments.
Please make sure your resource requests (resources.requests) are properly defined for your deployment, because HPA uses these resource requests as base to calculate the CPU utilization average.
You can easily set it up with a few extra lines in your gap.yaml. Let’s see an example:
deployments:
web:
command: ["command", "to", "run", "in", "web"]
ingress:
enabled: true
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
name: cpu
targetAverageUtilization: 50
resources:
requests:
cpu: 500m
memory: 500Mi
limits:
cpu: 1000m
memory: 1000Mi
The following settings should be set in the autoscaling object:
enabled- autoscaling enable/disable switch. Disabled by default.minReplicas- Lower limit for the number of pods that can be set by the autoscaler. Default: 2.maxReplicas- Upper limit for the number of pods that can be set by the autoscaler. It cannot be smaller than MinReplicas. Default: 4.targetCPUUtilizationPercentage- Target average CPU utilization (represented as a percentage of requested CPU) over all the pods. Default: 50
Google Kubernetes Engine’s recommended way is to deploy the Custom Metrics Stackdriver adapter on the cluster. Custom Metrics - Stackdriver Adapter is an implementation of Custom Metrics API and External Metrics API using Stackdriver as a backend. Its purpose is to enable pod autoscaling based on Stackdriver custom metrics.
There are multiple ways to get custom metrics into Stackdriver. You can either export your custom metrics directly from your application or expose them in prometheus format and use Prometheus-to-Stackdriver to push your exposed prometheus metrics into Stackdriver.
The system has some pre-defined custom metrics, including rates numbers from the ingress controller. The following example sets up autoscaling based on incoming request rate to a deployment:
deployments:
web:
...
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 4
metrics:
- type: Object
describedObject:
kind: Ingress
name: ingress-name
apiVersion: networking.k8s.io/v1
name: nginx_ingress_controller_requests_rate
targetAverageValue: "100"
Or as a custom resource
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: hpa-name
namespace: namespace-name (optional)
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: deployment-name
minReplicas: 1
maxReplicas: 10
metrics:
- type: Object
object:
describedObject:
kind: Ingress
name: ingress-name
apiVersion: networking.k8s.io/v1
metric:
name: nginx_ingress_controller_requests_rate
target:
type: AverageValue
averageValue: "100"
#TODO remove once this is fixed https://github.com/argoproj/argo-cd/discussions/6349
value: "100"
Some important info here:
averageValueis the autoscaler’s target per pod. So in this example it targets to scale a pod for every 100 request per second to the deployment.valueneeds to be specified even whenaverageValueis used, but it’s not used by the HPA in this case. The difference is that the metric is divided by the number of pods first when usingaverageValue- the metric needs to be something that reacts to scaling, incoming request rate normally does not but the per-pod request rate does.valueis set in this case to satisfy ArgoCD as it gets this property from K8 and without it can not sync, see linked GitHub discussion in the example comment.- in case a custom resource is used (not defined in the gap.yaml) the replicas field needs to be disabled explicitly in the deployment section of the gap.yaml to avoid resetting the replicaset on deployment. This can be achieved with setting
deployments.<name>.replicas: "null"(quotes are necessary). If the autoscaling is defined in gap.yaml, this is not necessary and handled by the manifest generation.
After verifying your metrics are properly arriving in Stackdriver you can set up your HorizontalPodAutoscaler object for your application. You can find examples for this here: Autoscaling based on metrics from all Pods, Autoscaling based on metrics from a single Pod. Since this is not supported in GAP by default, you have to manually define your own HorizontalPodAutoscaler object as a patch yaml.