Node Common Fault Handling
Nodes, as entities carrying workloads, are very important objects in Kubernetes. In actual operation, nodes may encounter various problems. This article briefly describes various abnormal states of nodes and troubleshooting ideas.
1. Node Status Description
Node Situation | Description | Handling Method |
---|---|---|
Ready | True indicates that the node is healthy. False indicates that it’s unhealthy. Unknown indicates that the node is missing. | |
DiskPressure | True indicates that the node disk capacity is tight. False indiactes the opposite. | |
MemoryPressure | True indicates that the node memory usage is too high. False indiactes the opposite. | |
PIDPressure | True indicates that there are too many processes running on the node. False indiactes the opposite. | |
NetworkUnavailable | True indicates that the node network configuration is abnormal. False indiactes the opposite. |
2. Common Node Commands
- Check node status
kubectl get nodes
- View node events
kubectl describe node ${NODE_NAME}
When these two commands can’t indicate anything, we can also use Linux’s related commands to assist in judgment. At this time, we need to log in to the node and use linux-related commands to check the node status.
-
Check node connectivity
3.1 Network check: We can check the network connectivity of the node using the Ping command from the cluster’s Master node.
3.2 Health check: Log in to the Genesis Cloud console, view the node on the cloud host page to see whether it is in Running status, including viewing CPU and memory usage rate, confirming whether the node is under high load.
3. K8S Component Fault Check
UK8S clusters default to three Master nodes. K8S core components are deployed on all three Master nodes and provide external services through load balancing. If you detect component abnormalities, log in to the corresponding Master node (or log in to each Master node sequentially if the issue cannot be localized). Use the following commands to check component status, identify error causes, and restart abnormal components:
systemctl status ${PLUGIN_NAME}
journalctl -u ${PLUGIN_NAME}
systemctl restart ${PLUGIN_NAME}
UK8S Core Components and Their Names:
Component | Component Name |
---|---|
Kubelet | kubelet |
API Server | kube-apiserver |
Controller Manager | kube-controller-manager |
Etcd | etcd |
Scheduler | kube-scheduler |
KubeProxy | kube-proxy |
For instance, to check APIServer component status, you need to execute systemctl status kube-apiserver
.
4. UK8S Home Page Keeps Refreshing?
- Is the ulb4 corresponding to api-server deleted (
uk8s-xxxxxx-master-ulb4
) - Have the three master hosts of the UK8S cluster been deleted or shut down, etc.
- Log into the three master nodes of UK8S, and check whether the etcd and kube-apiserver services are normal. If abnormal, try to restart the service
- 3.1
systemctl status etcd
/systemctl restart etcd
If a single etcd restart fails, try to restart the etcd of the three nodes at the same time - 3.2
systemctl status kube-apiserver
/systemctl restart kube-apiserver
5. What to Do When UK8S Node is NotReady
- Use
kubectl describe node node-name
to check the reason for the node being NotReady, or directly view node details on the console page. - If you can log into the node, view the logs of kubelet with
journalctl -u kubelet
, and check whether kubelet is working normally withsystem status kubelet
. - For nodes that cannot be logged in, if you need a quick recovery, power off and restart the corresponding host via the console.
- View host monitoring, or log in to the host and execute the
sar
command. If disk/CPU usage suddenly increases alongside high memory usage, it is generally caused by a memory OOM (Out of Memory). When memory usage is excessively high, disk cache becomes minimal, leading to frequent disk I/O, further increasing system load and creating a vicious cycle of high CPU usage. - For memory OOM cases, customers should self-check process memory conditions. K8S recommends that request and limit values should not differ too much—significant differences make nodes more prone to crashes.
- If you have questions about the cause of node notready, please contact manual support according to UK8S Manual Support