Container Service for Kubernetes (UK8S)

Cluster Autoscaler (CA)

Introduction

Cluster Autoscaler (CA) is used to automatically adjust the number of Node nodes in the cluster to meet business requirements.

When creating a Pod, we can specify the requested resources (Request) such as CPU, memory, and GPU for each container. The Kubernetes scheduling component (scheduler) uses the Request to determine which Node to schedule the Pod to. If no node in the cluster has sufficient free capacity, the Pod cannot be created successfully and will remain in a Pending state until new Node nodes join the cluster or existing Pods are deleted to release free capacity.

The CA component searches for unschedulable Pods, iterates through the scaling groups, and determines whether new nodes expanded through the scaling group template meet the requirements. If it is determined that the new nodes can enable the Pod to be scheduled successfully, CA will expand the cluster.

The CA component will also scale down the cluster. The trigger condition for scaling down is that the Request request rate of a Node node is lower than the scaling-down threshold. However, scaling down is not performed immediately but after a waiting period (currently defaulting to 10 minutes), which can be modified by the —scale-down-unneeded-time parameter.

Different from HPA, CA is not built-in but runs in the Kubernetes cluster in the form of a Deployment. UK8S already supports CA, which can be configured in the UK8S management interface.

Working Principle

The scale-up trigger condition for CA is there exist Pods that cannot be successfully created due to insufficient cluster resources. These resources include CPU, memory, and GPU. Taking GPU as an example, when a Pod applies for the GPU resource nvidia.com/gpu (refer to GPU node usage document), but is in a pending state due to no GPU nodes in the cluster, CA will automatically scale up nodes in the scaling group configured with the GPU model template.

The triggering condition for CA scale-down is the node’s resource request rate (Request) is below the scale-down threshold (such as 50%) for a certain period of time (default 10 minutes), and all Pods on the node can be scheduled to other nodes.

It’s worth noting the condition that all Pods on the node can be scheduled to other nodes. Many students who configured CA will question why the node resource request volume is below the threshold but the scale-down is not triggered. The reason is actually simple. If there is an independent Pod running on this node (not managed by any controller), since the Pod cannot be rescheduled, to ensure the normal operation of the business, the node’s scale-down will not proceed.

Using Cluster Scaling in UK8S

1. Create Scaling Configuration

2. Fill in Configuration Parameters

Usually, default values are sufficient

3. Create Scaling Groups

The scaling group configures the Node node settings when triggering cluster expansion, where the scaling range refers to the maximum and minimum number of expandable machines for nodes. The maximum value is mainly used to prevent unlimited expansion caused by DDoS attacks, etc.

4. Turn on Cluster Scaling

After creating the scaling group, you need to enable it. After clicking the “Enable” operation, a Cluster-Autoscaler Deployment will appear in your UK8S cluster. If you manually delete this Deployment, cluster scaling will not work properly. You need to first disable and then re-enable it on the cluster scaling page to trigger re-creation.

CA Parameter Description

CA itself has many command parameters that can adjust some scaling behaviors, which can be configured by modifying the args parameter of the CA deployment.

Below are some CA parameters and descriptions:

Parameter	Type	Default Value	Explanation
scale-down-delay-after-add	Duration	10min	Delay for scale-down after expansion.
scale-down-delay-after-delete	Duration	Same as scanning interval	Delay for scale-down after node deletion.
scale-down-unneeded-time	Duration	10min	Time to scale-down after node is marked as unneeded.
node-deletion-delay-timeout	Duration	2min	CA’s timeout for waiting for node deletion to complete.
scan-interval	Duration	10s	Time interval for each scale-in scan.
max-nodes-total	int	0	Maximum scaling node quantity.
cores-total	String	[0:32E+04]	CPU core scaling range of the cluster.
memory-total	String	[0:64E+05]	Memory scaling range of the cluster.