Auto scaling

Auto scaling is implemented as a kubernetes concept called Horizontal Pod Autoscaler. The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics). GAP currently supports auto scaling based on pod CPU utilization. It’s possible to set it up to be based on a custom metric, however you need to implement that manually.

The Horizontal Pod Autoscaler is implemented as a Kubernetes API resource and a controller. The resource determines the behavior of the controller. The controller periodically adjusts the number of replicas in a replication controller or deployment to match the observed average CPU utilization to the target specified by user.

Inner workings

To understand the inner workings of the Horizontal Pod Autoscaler, please read the official documentation.

Default behaviour in GAP

Please note that autoscaling is disabled by default in GAP. You have to manually enable it using the autoscaling object in your gap.yaml to utilize it. You can define separate autoscaling behaviour for each of your deployments.

Things to look out for

Please make sure your resource requests (resources.requests) are properly defined for your deployment, because HPA uses these resource requests as base to calculate the CPU utilization average.

CPU utilization based scaling

You can easily set it up with a few extra lines in your gap.yaml. Let’s see an example:


deployments:
  web:
    command: ["command", "to", "run", "in", "web"]
    ingress:
      enabled: true
    autoscaling:
      enabled: true
      minReplicas: 2
      maxReplicas: 4
      metrics:
      - type: Resource
        name: cpu
        targetAverageUtilization: 50
    resources:
      requests:
        cpu: 500m
        memory: 500Mi
      limits:
        cpu: 1000m
        memory: 1000Mi

The following settings should be set in the autoscaling object:

enabled - autoscaling enable/disable switch. Disabled by default.
minReplicas - Lower limit for the number of pods that can be set by the autoscaler. Default: 2.
maxReplicas - Upper limit for the number of pods that can be set by the autoscaler. It cannot be smaller than MinReplicas. Default: 4.
targetCPUUtilizationPercentage - Target average CPU utilization (represented as a percentage of requested CPU) over all the pods. Default: 50

Preparing the cluster

Google Kubernetes Engine’s recommended way is to deploy the Custom Metrics Stackdriver adapter on the cluster. Custom Metrics - Stackdriver Adapter is an implementation of Custom Metrics API and External Metrics API using Stackdriver as a backend. Its purpose is to enable pod autoscaling based on Stackdriver custom metrics.

Pushing custom metrics

There are multiple ways to get custom metrics into Stackdriver. You can either export your custom metrics directly from your application or expose them in prometheus format and use Prometheus-to-Stackdriver to push your exposed prometheus metrics into Stackdriver.

Scaling on incoming request rate (UNRELIABLE)

The system has some pre-defined custom metrics, including rates numbers from the ingress controller. The following example sets up autoscaling based on incoming request rate to a deployment:

deployments:
  web:
    ...
    autoscaling:
      enabled: true
      minReplicas: 2
      maxReplicas: 4
      metrics:
      - type: Object
        describedObject:
          kind: Ingress
          name: ingress-name
          apiVersion: networking.k8s.io/v1
        name: nginx_ingress_controller_requests_rate
        targetAverageValue: "100"

Or as a custom resource

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-name
  namespace: namespace-name (optional)
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: deployment-name
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Object
    object:
      describedObject:
        kind: Ingress
        name: ingress-name
        apiVersion: networking.k8s.io/v1
      metric:
        name: nginx_ingress_controller_requests_rate
      target:
        type: AverageValue
        averageValue: "100"
        #TODO remove once this is fixed https://github.com/argoproj/argo-cd/discussions/6349
        value: "100"

Some important info here:

averageValue is the autoscaler’s target per pod. So in this example it targets to scale a pod for every 100 request per second to the deployment.
value needs to be specified even when averageValue is used, but it’s not used by the HPA in this case. The difference is that the metric is divided by the number of pods first when using averageValue - the metric needs to be something that reacts to scaling, incoming request rate normally does not but the per-pod request rate does. value is set in this case to satisfy ArgoCD as it gets this property from K8 and without it can not sync, see linked GitHub discussion in the example comment.
in case a custom resource is used (not defined in the gap.yaml) the replicas field needs to be disabled explicitly in the deployment section of the gap.yaml to avoid resetting the replicaset on deployment. This can be achieved with setting deployments.<name>.replicas: "null" (quotes are necessary). If the autoscaling is defined in gap.yaml, this is not necessary and handled by the manifest generation.

Setting up scaling

After verifying your metrics are properly arriving in Stackdriver you can set up your HorizontalPodAutoscaler object for your application. You can find examples for this here: Autoscaling based on metrics from all Pods, Autoscaling based on metrics from a single Pod. Since this is not supported in GAP by default, you have to manually define your own HorizontalPodAutoscaler object as a patch yaml.