UK8S - Genesis Cloud

Elastic Scaling of Containers Based on Custom Metrics

Preface

HPA (Horizontal Pod Autoscaling) refers to the horizontal auto-scaling of Kubernetes Pods, and it is also an API object in Kubernetes. Through this scaling component, the Kubernetes cluster can use monitoring metrics (such as CPU usage) to automatically scale in or out the number of Pods in services.

When business demand increases, HPA automatically increases the number of Pods to enhance system stability.
When business demand decreases, HPA automatically reduces the number of Pods to decrease resource requests (Request). Combined with Cluster Autoscaler, it can realize automatic cluster scaling and save IT costs.

It should be noted that the default HPA only supports scaling based on CPU and memory thresholds. However, Kubernetes can also call Prometheus via the custom metric API to implement custom metrics, enabling elastic scaling based on more flexible monitoring indicators.

Note: HPA cannot scale controllers that do not support scaling, such as DaemonSet.

Principle

The working principle of HPA is briefly introduced as follows: By default, it obtains CPU and Memory metrics of Pods through the local service metrics.k8s.io. CPU and Memory are core metrics, and the backend service for metrics.k8s.io is typically the Metrics Server, which is pre-installed in UK8S.

To scale containers based on non-CPU/memory metrics, a monitoring system like Prometheus must be deployed to collect various metrics. However, Prometheus metrics cannot be directly used by K8S due to incompatible data formats. Therefore, a component called prometheus-adapter is required to convert Prometheus metrics into the Kubernetes Custom Metrics API format.

Deployment

Before start, ensure that:

Prometheus monitoring is enabled in the UK8S console (Details > Monitoring Center).
Helm 3.x is installed locally.
The Metrics Server service is deployed in the UK8S cluster.

Install Prometheus Adapter

Prometheus Adapter is a Kubernetes component that converts Prometheus metrics into the Kubernetes Custom Metrics API format. Deploy it using the following commands on a host with Helm 3.x installed and cluster access via kubectl:


helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus-adapter prometheus-community/prometheus-adapter \
  -n uk8s-monitor \
  --set prometheus.url=http://uk8s-prometheus.uk8s-monitor.svc \
  --set image.repository=uhub.service.ucloud.cn/uk8s/prometheus-adapter \
  --set image.tag=v0.12.0

Enable custom.metrics.k8s.io service

After deploying Prometheus Adapter, you need to register the Custom Metrics API with the API Aggregator (part of the Kubernetes main API server). Create an APIService custom resource to allow Kubernetes controllers (e.g., HPA) to access custom metrics provided by Prometheus Adapter via standard API endpoints.

Declare an APIService for custom.metrics.k8s.io and apply it to the cluster:


apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  group: custom.metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: prometheus-adapter
    namespace: monitoring
    port: 443
  version: v1beta1
  versionPriority: 100

Test

This exercise will introduce the basics of setting up a Prometheus adapter on a cluster and how to configure an autoscaler to use application metrics from the adapter. For more detailed information, please refer to the Prometheus Adapter Walkthrough.

Deploy a Test Service

Deploy your application to the cluster and expose it through a service so that you can send traffic to it and obtain metrics from it:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-app
  labels:
    app: sample-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sample-app
  template:
    metadata:
      labels:
        app: sample-app
    spec:
      containers:
      - image: uhub.service.ucloud.cn/uk8s/autoscale-demo:v0.1.2
        name: metrics-provider
        ports:
        - name: http
          containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: sample-app
  name: sample-app
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: sample-app
  type: ClusterIP

Now, check your application to ensure it exposes metrics and that the http_requests_total metric correctly reflects the number of request accesses. You can test this on a host with access to the Pod (such as the master node) using the following command:


curl http://$(kubectl get pod -l app=sample-app -o jsonpath='{.items[0].status.podIP}'):8080

Note: The counter increases each time the page is accessed.

Configure HPA

Now, you need to ensure that the application can be automatically scaled based on this metric to prepare for release. You can use a HorizontalPodAutoscaler as shown below to achieve automatic scaling:


kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
  name: sample-app
spec:
  scaleTargetRef:
    # Specify the target resource for automatic scaling, here it is a Deployment named sample-app
    apiVersion: apps/v1
    kind: Deployment
    name: sample-app
  # Set the range of replicas: at least 1, at most 10
  minReplicas: 1
  maxReplicas: 10
  metrics:
  # Use custom metrics of type Pods to calculate the average of this metric for each Pod
  - type: Pods
    pods:
      # Specify the metric name as http_requests, which is a custom metric (custom metrics)
      # Currently not effective, requiring Prometheus Adapter configuration to support this metric
      metric:
        name: http_requests
      # Specify the scaling trigger threshold as an average of 500m (500 millirequests/second) per Pod
      # That is, when each Pod processes 1 request every 2 seconds, HPA will maintain the current number of replicas
      # If this rate is exceeded, HPA will automatically scale out; otherwise, it will scale in
      target:
        type: Value
        averageValue: 500m

Monitor Configuration

To monitor your application, create a ServiceMonitor pointing to the application. Assume your Prometheus instance is configured to discover services with the app: sample-app label, create the following ServiceMonitor to collect metrics from the service:

Preconditions: Your application service must meet the following requirements for successful metric collection by Prometheus:

The service has the label app: sample-app (or matches the selector in ServiceMonitor).
The service exposes the metrics interface via a named port (e.g., http).
The application exposes standard Prometheus metrics on this port, with the default scraping path /metrics (or a custom path via the endpoints.path field).


kind: ServiceMonitor
apiVersion: monitoring.coreos.com/v1
metadata:
  name: sample-app
  labels:
    app: sample-app
  namespace: default
spec:
  selector:
    matchLabels:
      app: sample-app
  endpoints:
  - port: http

After deployment, the http_requests_total metric should appear in your Prometheus instance. Locate them via the dashboard and ensure they include namespace and pod labels. If mismatches occur, verify that labels on the ServiceMonitor match those on the Prometheus CRD.

Configure Prometheus Adapter

Now that you have a running Prometheus instance monitoring your application, you need to deploy the adapter, which acts as a translator between Kubernetes and Prometheus, knowing how to communicate with both.

To enable custom metrics to be displayed in Kubernetes, you must configure the adapter’s rules to tell it how to extract metrics from Prometheus and convert them into a format supported by Kubernetes:


apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: uk8s-monitor
data:
  config.yaml: |-
    # Prometheus Adapter custom metric rules configuration
    "rules":
    - "seriesQuery": |
         {namespace!="",__name__!~"^container_.*"}  # Query all metrics with namespace label that do not start with container_
      "resources":
        "template": "<<.Resource>>"  # Map to K8s resources like Pod, Deployment, etc.
      "name":
        "matches": "^(.*)_total"     # Match all metrics ending with _total (e.g., http_requests_total)
        "as": ""                     # Keep the original metric name
      "metricsQuery": |
        sum by (<<.GroupBy>>) (      # Aggregate by specified labels
          irate (
            <<.Series>>{<<.LabelMatchers>>}[1m]  # Use irate function to calculate requests per second over 1-minute window
          )
        )

Restart Prometheus Adapter to apply configuration


kubectl rollout restart deployment prometheus-adapter -n uk8s-monitor

Use kubectl get —raw to check metric values, which sends a raw GET request to the Kubernetes API server with automatic authentication:


# Query current value of custom metric http_requests for all Pods with label app=sample-app in default namespace
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta2/namespaces/default/pods/*/http_requests?selector=app%3Dsample-app" | jq .

Due to the adapter configuration, the cumulative metric http_requests_total is converted to a rate metric pods/http_requests, measuring requests per second per Pod over a 1-minute window. Currently, the value should be close to zero since the application has no actual traffic except for Prometheus scraping.

If everything works, the command will return output similar to:


{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta2",
  "metadata": {},
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "namespace": "default",
        "name": "sample-app-85d5996dc6-q4s74",
        "apiVersion": "/v1"
      },
      "metric": {
        "name": "http_requests",
        "selector": null
      },
      "timestamp": "2025-04-29T09:59:22Z",
      "value": "52m"
    }
  ]
}

Test

Generate traffic using curl:


timeout 1m bash -c '
  while sleep 0.1; do
    curl http://$(kubectl get pod -l app=sample-app -o jsonpath="{.items[0].status.podIP}"):8080
  done
'

Check the HPA again—you should see the last observed metric value roughly match your request rate, and the HPA should have recently scaled your application.

Now your application auto-scales based on HTTP requests and is ready for formal release! If you leave the application idle for a while, the HPA should scale it down, helping you save valuable budget for the official launch.