Kubernetes troubleshooting edit
On this page
Source: UCMDB 2021.08 documentation.
Summary of issues
| Non-native language mistakes | Including: “Massive” error messages, “a known issue of Kubernetes”, “see details from”, and “then run below command.” |
| Formatting issues | The link to the Kubernetes documentation has no anchor text. The “Terminating” state is formatted inconsistently. There is no need to format “Kubelet” with code tags. |
| Technical accuracy | “Kubelet” should be “the kubelet”, as defined by the Kubernetes glossary. |
| Structure | The first sentence of the Cause section is actually part of the symptoms. |
| Redundancies | “On a node that has CDF installed” is redundant: this is documentation about the product CDF (therefore, all nodes have CDF installed on them). |
ORIGINAL TEXT: Kubelet gets stuck with massive “runtime service failed: rpc error: code = Unknown” error messages
You receive massive error messages on a node with CDF installed that resemble the following:
b83ace3b5e0870284e554502e8922563a4d9587b800b6b699dc2c2acfcc9b7cc" from runtime service failed: rpc error: code = Unknown desc = unable to inspect docker image "sha256:a950dd441cee8f60ce4ee325799c62e5fe444fa8e851b5c96b9172da0ced8d28" while inspecting docker container "b83ace3b5e0870284e554502e8922563a4d9587b800b6b699dc2c2acfcc9b7cc": no such image: "sha256:a950dd441cee8f60ce4ee325799c62e5fe444fa8e851b5c96b9172da0ced8d28"
When you check the pod status, some pods are stuck in Terminating state.
Cause
This issue occurs because Kubelet loops when trying to inspect a Docker container for a pod whose image has been deleted or cleaned up. This is a known issue of Kubernetes. See details from https://github.com/kubernetes/kubernetes/issues/84214.
Solution
- Log on to the node where you receive these error messages.
- Run the following command to check the pod status:
kubectl get pods -n core -o wide - Identify the pods that are stuck in the “Terminating” state on this node. Then run below command to delete the pods. You need to replace the <pod name> placeholder with the name of the pod that is in the “Terminating” state. Run the following command for all the “terminating” pods:
kubectl delete pod <pod name> -n core --force --grace-period=0 - Run the following command to restart CDF:
K8S_HOME/bin/kube-restart.sh
EDITED TEXT: “runtime service failed: rpc error: code = Unknown” error messages and the kubelet enters a restart loop
When the kubelet tries to inspect the Docker container of a pod whose image was deleted or cleaned up, the kubelet enters a cyclical restart loop. When this issue occurs, the pod becomes stuck in the “Terminating” state, and you receive error messages that resemble the following:
b83ace3b5e0870284e554502e8922563a4d9587b800b6b699dc2c2acfcc9b7cc" from runtime service failed: rpc error: code = Unknown desc = unable to inspect docker image "sha256:a950dd441cee8f60ce4ee325799c62e5fe444fa8e851b5c96b9172da0ced8d28" while inspecting docker container "b83ace3b5e0870284e554502e8922563a4d9587b800b6b699dc2c2acfcc9b7cc": no such image: "sha256:a950dd441cee8f60ce4ee325799c62e5fe444fa8e851b5c96b9172da0ced8d28"`
Cause
This is a known issue in Kubernetes. For more information, see Kubelet gets stuck trying to inspect a container whose image has been cleaned up.
Solution
- Log on to the node where you receive the error messages.
- Run the following command to check the pod status, and then identify the pods that are stuck in the “Terminating” state:
kubectl get pods -n core -o wide - Run the following command to delete the pods. Replace the
<pod name>placeholder with the name of the pod that is in the “Terminating” state. Do this for all pods stuck in the “Terminating” state.
kubectl delete pod <pod name> -n core --force --grace-period=0 - Run the following command to restart CDF:
$K8S_HOME/bin/kube-restart.sh