Horizontal Pod Autoscaler (HPA)

Introduction

HPA (Horizontal Pod Autoscaling) refers to the horizontal auto-scaling of Kubernetes Pods, and it is also an API object in Kubernetes. Through this scaling component, the Kubernetes cluster can use monitoring metrics (such as CPU usage) to automatically scale in or out the number of Pods in the service. When business demand increases, HPA will automatically increase the number of Pods in the service to improve system stability. When business demand decreases, HPA will automatically reduce the number of Pods in the service to decrease the request for cluster resources (Request). Cooperating with Cluster Autoscaler, it can also realize the automatic scaling of the cluster scale and save IT costs.

It should be noted that currently the default HPA can only support scaling in and out based on the thresholds of CPU and memory, but it can also call Prometheus through the custom metric api to implement custom metrics and achieve elastic scaling based on more flexible monitoring metrics. However, HPA cannot be used to scale some controllers that cannot be scaled, such as DaemonSet.

Working Principle

HPA is designed as a Controller in K8S and can be simply created using the kubectl autoscale command. The HPA Controller polls every 30 seconds by default, queries the resource utilization in the specified Resource (Deployment, RC), and compares it with the metrics set when creating the HPA to achieve the auto-scaling function.

After creating the HPA, it obtains the average utilization value of each Pod in a certain Deployment from the Metric Server (Heapster is not used in UK8S), then compares it with the metrics defined in the HPA, and calculates the specific value that needs to be scaled and performs operations. Its algorithm model is roughly as follows:


desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

For example, if the average CPU usage of all current Pods is 200m and the expected value is 100m, the number of replicas will double. If the current value is 50m, half of the replicas need to be reduced.

It should be noted that the HPA Controller has a concept of “tolerance”. When the ratio of currentMetricValue / desiredMetricValue is close to 1.0, scaling will not be triggered. The default variance is 0.1, which is mainly for system stability to avoid cluster oscillation. For example, if the HPA policy is to trigger scaling when CPU usage is higher than 50%, scaling will only be triggered when the usage is greater than 55%. HPA tries to control the Pod usage within the range of 45% to 55% by scaling Pods. You can adjust the variance value through the —horizontal-pod-autoscaler-tolerance parameter.

There is a window time after each scaling in and out. After performing the scaling operation, no scaling operation will be performed within this window time, which can be understood as a cooldown period similar to a skill. The default upscale delay is 3 minutes (—horizontal-pod-autoscaler-upscale-delay), and the downscale delay is 5 minutes (—horizontal-pod-autoscaler-downscale-delay).

Finally, it is worth noting that HPA will not work if the Pod does not set Request.

HPA Object Console Management

The addition, viewing, and deletion of HPA objects can be performed on the Elastic Scaling (HPA) subpage of the Cluster Scaling page of the UK8S cluster management console.

Click Add Form to add HPA objects through the console page, or you can add them through yaml.

Configuration Item	Description
Namespace	The namespace to which the HPA object belongs
HPA Object Name	The name must start with a lowercase letter and can only contain lowercase letters, numbers, periods (.) and hyphens (-)
Application Type	Supports Deployment and StatefulSet controllers
Application Name	Select Deployment and StatefulSet objects that need to be flexibly scaled
Expansion Threshold	Scale-in and scale-out thresholds, supporting the setting of CPU and memory utilization rates
Scale Interval	Range of Pod replica numbers

Detailed explanation of HPA API objects

The UK8S console creates HPA objects through the Kubernetes API version autoscaling/v2beta2.

Note: For version 1.26 and earlier clusters, please use autoscaling/v2beta2. For clusters starting with version 1.26, please use autoscaling/v2


apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginxtest
  namespace: default
spec:
  maxReplicas: 5 #Maximum number of replicas
  minReplicas: 1 #Minimum number of replicas
  metrics:
    # Set the trigger scaling CPU usage rate
    - type: Resource
      resource:
        name: cpu
        target:
          averageUtilization: 50
          type: Utilization
    # Set the trigger scaling MEM usage rate
    - type: Resource
      resource:
        name: memory
        target:
          averageUtilization: 50
          type: Utilization     
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment #Type of resource to be scaled
    name: nginxtest  #Name of the resource to be scaled

Case Study

Below we will use a simple example to see how HPA works.

1. Deploy test application


kubectl apply -f  https://docs.genesissai.com/uk8s/yaml/hpa/hpa-example.yaml

This is a compute-intensive PHP application, and the code example is as follows:


<?php
  $x = 0.0001;
  for ($i = 0; $i <= 1000000; $i++) {
    $x += sqrt($x);
  }
  echo "OK!";
?>

2. Enable HPA for test application


kubectl apply -f  https://docs.genesissai.com/uk8s/yaml/hpa/hpa.yaml

3. Deploy pressure test tool


kubectl apply -f https://docs.genesissai.com/uk8s/yaml/hpa/load.yaml

The pressure test tool is a busybox container. After the container is started, it circles and accesses the test application.


while true; do wget -q -O- http://hap-example.default.svc.cluster.local; done

4. Check the load situation of the test application


kubectl top pods | grep hpa-example

5. When the average CPU load of the test application exceeds 55%, we find that HPA will start to scale up Pods.


kubectl get deploy | grep hpa-example