GPU Plugins

1. Introduction

UK8S implements GPU sharing using the open-source component HAMi, including:

Video memory partitioning
Computing power partitioning
Error isolation

2. Deployment

⚠️ Check that the system meets the following requirements before installation:

NVIDIA drivers: Version ≥ 440
Kubernetes version: ≥ 1.16
glibc: Version ≥ 2.17 and < 2.3.1

2.1 Label GPU nodes to be scheduled by HAMi:


kubectl label nodes xxx.xxx.xxx.xxx gpu=on

2.2 Helm installation

⚠️ Helm version requirement: ≥ 3.0. Check the Helm version after installation:


helm version

2.3 Obtain the chart

Download and extract the chart package:


wget https://docs.ucloud.cn/uk8s/yaml/gpu-share/hami.tar.gz
tar -xzf hami.tar.gz
rm hami.tar.gz

2.4 Install HAMi


helm install hami ./hami -n kube-system

Check the installation result:


kubectl get po -A

When the installation is successful, the output shows the pod status:


hami-device-plugin-l4jj4                2/2     Running   0          45s
hami-scheduler-59c7f4b6ff-7g565         2/2     Running   0          3m54s

2.5 Usage

Create a Pod via a YAML file. Note that in resources.limits, besides the traditional nvidia.com/gpu, add nvidia.com/gpumem and nvidia.com/gpucores to specify the video memory size and GPU computing power.

nvidia.com/gpu: Number of vGPUs requested (e.g., 1).
nvidia.com/gpumem: Requested video memory size (e.g., 3000M).
nvidia.com/gpumem-percentage: Video memory request percentage (e.g., 50 for 50% of video memory).
nvidia.com/gpucores: Computing power percentage of each vGPU relative to the actual GPU.
nvidia.com/priority: Priority level (0 for high, 1 for low; default is 1).

High-priority tasks: If sharing a GPU node with other high-priority tasks, their resource utilization is not limited by resourceCores. In other words, when only high-priority tasks exist on a GPU node, they can utilize all available resources on the node.

Low-priority tasks: If a low-priority task is the only one occupying the GPU, its resource utilization is also not limited by resourceCores. This means low-priority tasks can utilize all available node resources when no other tasks share the GPU.

3. Monitoring

3.1 If the monitoring center is not enabled

Enable the monitoring center.

3.2 If the monitoring center is already enabled

⚠️ If the monitoring center version is 1.0.6 > version >= 1.0.5-3 or version > 1.0.6, the following deployment files are installed by default. Skip the deployment in 3.2.1.

3.2.1 Deploy Dcgm-Exporter


kubectl get po -A

3.2.2 Add monitoring targets in UK8S

In the UK8S monitoring center’s monitoring targets, add monitoring targets in UK8S as shown in the diagram.

3.3 Grafana monitoring

After adding monitoring targets in UK8S and logging into Grafana:

Download the JSON file.
Select the ’+’ icon in the left navigation bar.
Click Import.
Paste the downloaded JSON content into the second input box.
Click Load.

The following is a schematic diagram of Grafana monitoring for HAMi:

3.4 Monitoring indicators

In addition to the indicators from the DCGM plugin (specifically described in the GPU monitoring documentation), HAMi also supports the following indicators:

Device_memory_desc_of_container: Real-time device memory usage in the container, used to monitor the memory consumption of devices (e.g., GPU) for each container.
Device_utilization_desc_of_container: Real-time device utilization rate of the container, used to monitor the usage of internal devices (e.g., GPU workload).
HostCoreUtilization: Real-time core utilization rate of the host, used to monitor the CPU core usage of the host, including multiple workloads from containers or virtualization.
HostGPUMemoryUsage: Real-time memory usage of GPU devices on the host, used to monitor memory consumption by containers or tasks using GPUs on the host.
vGPU_device_memory_limit_in_bytes: Memory limit for a container’s vGPU (virtual GPU) device, in bytes. This is the maximum GPU memory the container can use.
vGPU_device_memory_usage_in_bytes: Actual memory usage of a container’s vGPU device, in bytes, used to monitor vGPU memory consumption of the container.

4. Testing

4.1 Node GPU Resource Verification

In the test environment, the number of physical GPUs is 1. However, since HAMi’s default configuration sets the expansion ratio to 10x, theoretically, 1 * 10 = 10 GPUs can be viewed on the node.

Verification Command

Execute the following command to get the Node’s GPU resource quantity:


kubectl get node xxx -oyaml | grep capacity -A 8

Sample Output:


capacity:
  cpu: "16"
  ephemeral-storage: 102687672Ki
  hugepages-1Gi: "0"
  hugepages-2Mi: "0"
  memory: 32689308Ki
  nvidia.com/gpu: "10"
  pods: "110"
  ucloud.cn/uni: "16"

4.2 GPU Memory Test

The configuration of gpu-mem-test.yaml is as follows:


apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: ubuntu-container
      image: uhub.service.ucloud.cn/library/ubuntu:trusty-20160412
      command: ["bash", "-c", "sleep 86400"]
      resources:
        limits:
          nvidia.com/gpu: 1       # Request 1 vGPU
          nvidia.com/gpumem: 3000 # Allocate 3000MiB memory per vGPU (optional, integer)
          nvidia.com/gpucores: 30 # Allocate 30% computing power per vGPU (optional, integer)

Create the Pod using this configuration. It should start normally. Verification steps:


kubectl get po

Expected output:


NAME      READY   STATUS    RESTARTS   AGE
gpu-pod   1/1     Running   0          48s

Enter the Pod and execute the nvidia-smi command to view GPU information, and verify that the video memory limit is 3000M as requested in the resources.

4.3 Multiple Pods Single GPU Utilization Test

Test Command

First, create hami-npod-1gpu.yml with the following content. Replace the GPU node IP to specify the target GPU node.

This configuration creates n Pods (controlled by modifying the replica count in the YAML). Manually delete them after testing.


apiVersion: apps/v1
kind: Deployment
metadata:
  name: hami-npod-1gpu
spec:
  replicas: 3  # Create three identical Pods (adjust as needed)
  selector:
    matchLabels:
      app: pytorch
  template:
    metadata:
      labels:
        app: pytorch
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - xxx.xxx.xxx.xxx # Replace with actual GPU node IP
      containers:
      - name: pytorch-container
        image: uhub.service.ucloud.cn/gpu-share/gpu_pytorch_test:latest
        command: ["/bin/sh", "-c"]
        args: ["cd /app/pytorch_code && python3 2.py"]
        resources:
          limits:
            nvidia.com/gpu: 1
            nvidia.com/gpumem: 3000
            nvidia.com/gpucores: 25

Test Results

Monitoring shows that the long-term average computing power consumption of the three Pods aligns with the configured limits, though short-term fluctuations may occur.