This topic describes the diagnostic procedure for pods and how to troubleshoot pod errors. This topic also provides answers to some frequently asked questions about pods.
Table of contents
Item | Content |
Diagnostic procedure | |
Common troubleshooting methods | |
FAQ and solutions |
Diagnostic procedure
Check whether the pod runs as expected. For more information, see Check the status of a pod.
If the pod does not run as expected, you can identify the cause by checking the events, logs, and configurations of the pod. For more information, see Common troubleshooting methods. For more information about the abnormal states of pods and how to troubleshoot pod errors, see Abnormal states of pods and troubleshooting.
If the pod is in the Running state but does not run as expected, see Pods remain in the Running state but do not run as expected.
If an out of memory (OOM) error occurs in the pod, see Troubleshoot OOM errors in pods.
If the issue persists, submit a ticket.
Abnormal states of pods and troubleshooting
Pod status | Description | Solution |
Pending | The pod is not scheduled. | |
Init:N/M | The pod contains M init containers and N init containers are started. | Pods remain in the Init:N/M, Init:Error, or Init:CrashLoopBackOff state |
Init:Error | Init containers fail to start up. | Pods remain in the Init:N/M, Init:Error, or Init:CrashLoopBackOff state |
Init:CrashLoopBackOff | Init containers are stuck in a startup loop. | Pods remain in the Init:N/M, Init:Error, or Init:CrashLoopBackOff state |
Completed | The pod has completed the startup command. | |
CrashLoopBackOff | The pod is stuck in a startup loop. | |
ImagePullBackOff | The pod fails to pull the container image. | |
Running |
|
|
Terminating | The pod is being terminated. |
Common troubleshooting methods
Check the status of a pod
Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
In the left-side navigation pane of the details page, choose .
In the upper-left corner of the Pods page, select the namespace to which the pod belongs. Then, find the pod and check the status of the pod.
If the pod is in the Running state, the pod runs as expected.
If the pod is not in the Running state, the pod is abnormal. To troubleshoot the issue, refer to Abnormal states of pods and troubleshooting.
Check the details of a pod
Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
In the left-side navigation pane of the details page, choose .
In the upper-left corner of the Pods page, select the namespace to which the pod belongs. In the list of pods, find the pod and click the name of the pod or click View Details in the Actions column to view information about the pod. You can view the name, image, and IP address of the pod.
Check the configurations of a pod
Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
In the left-side navigation pane of the details page, choose .
In the upper-left corner of the Pods page, select the namespace to which the pod belongs. In the list of pods, find the pod and click the name of the pod or click View Details in the Actions column.
In the upper-right corner of the pod details page, click Edit to view the YAML file and configurations of the pod.
Check the events of a pod
Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
In the left-side navigation pane of the details page, choose .
In the upper-left corner of the Pods page, select the namespace to which the pod belongs. In the list of pods, find the pod and click the name of the pod or click View Details in the Actions column.
In the upper-right corner of the pod details page, click Edit to view the YAML file and configurations of the pod.
In the lower part of the pod details page, click the Events tab to view the events of the pod.
NoteBy default, Kubernetes retains the events that occurred within the previous hour. If you want to retain events that occurred within a longer period of time, see Create and use an event center.
Check the logs of a pod
Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
In the left-side navigation pane of the details page, choose .
In the upper-left corner of the Pods page, select the namespace to which the pod belongs. In the list of pods, find the pod and click the name of the pod or click View Details in the Actions column.
In the lower part of the pod details page, click the Logs tab to view the logs of the pod.
NoteAlibaba Cloud Container Compute Service (ACS) is integrated with Simple Log Service. When you create a cluster, you can enable Simple Log Service to collect log data from the containers of the cluster. The log data is written to the standard output and text files. For more information, see Collect application logs by using the environment variables of pods.
Check the monitoring information about a pod
Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
In the left-side navigation pane of the cluster details page, choose .
On the Prometheus Monitoring page, click the Cluster Overview tab to view the following monitoring information about pods: CPU usage, memory usage, and network I/O.
Log on to a container by using the terminal
Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
In the left-side navigation pane of the details page, choose .
On the Pods page, find the pod that you want to manage and click Terminal in the Actions column.
You can log on to a container of the pod by using the terminal and view local files in the container.
Pod diagnostics
Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
In the left-side navigation pane of the details page, choose .
In the upper-left corner of the Pods page, select the namespace to which the pod belongs. In the list of pods, find the pod and click the name of the pod or click View Details in the Actions column to view information about the pod. You can view the name, image, and IP address of the pod.
On the Pods page, find the pod that you want to manage and click Diagnose in the Actions column.
After the pod diagnostic is complete, you can view the diagnostic result and troubleshoot the issue. For more information, see Work with cluster diagnostics.
Pods remain in the Pending state
Cause
If a pod remains in the Pending state, the pod cannot be scheduled to a specific node. This issue occurs if the pod lacks required resources or quota configurations are invalid.
Problem description
The pod remains in the Pending state.
Solution
Check the events of the pod and identify the reason why the pod cannot be scheduled to a node based on the events. Possible causes:
Resource dependency
Some pods cannot be created without specific cluster resources, such as ConfigMaps and persistent volume claims (PVCs). For example, before you specify a PVC for a pod, you must associate the PVC with a persistent volume (PV).
Invalid quota configurations
Check the events and audit logs of the pod.
Pods remain in the Init:N/M state, Init:Error state, or Init:CrashLoopBackOff state
Cause
If a pod remains in the Init:N/M state, the pod contains M init containers, N init containers are started, and M-N init containers fail to start up.
If a pod remains in the Init:Error state, the init containers in the pod fail to start up.
If a pod remains in the Init:CrashLoopBackOff state, the init containers in the pod are stuck in a startup loop.
Problem description
Pods remain in the Init:N/M state.
Pods remain in the Init:Error state.
Pods remain in the Init:CrashLoopBackOff state.
Solution
View the events of the pod and check whether errors occur in the init containers that fail to start up in the pod. For more information, see Check the events of a pod.
Check the logs of the init containers that fail to start up in the pod and troubleshoot the issue based on the log data. For more information, see Check the logs of a pod.
Check the configurations of the pod and make sure that the configurations of the init containers that fail to start up are valid. For more information, see Check the configurations of a pod. For more information about init containers, see Debug init containers.
Pods remain in the ImagePullBackOff state
Cause
If a pod remains in the ImagePullBackOff state, the pod is scheduled in the background, but the container image fails to be pulled.
Problem description
Pods remain in the ImagePullBackOff state.
Solution
Check the description of the corresponding pod event and check the name of the container image that fails to be pulled.
Check whether the name of the container image is valid.
If the image that you use is stored in a private image repository, troubleshoot the issue based on the instructions in Create an application by using a private image repository.
Pods remain in the CrashLoopBackOff state
Cause
If a pod remains in the CrashLoopBackOff state, the application in the pod encounters an error.
Problem description
Pods remain in the CrashLoopBackOff state.
Solution
View the events of the pod and check whether errors occur in the pod. For more information, see Check the events of a pod.
Check the logs of the pod and troubleshoot the issue based on the log data. For more information, see Check the logs of a pod.
View the configurations of the pod and check whether the health check configurations are valid. For more information, see Check the configurations of a pod. For more information about health checks for pods, see Configure Liveness, Readiness and Startup Probes.
Pods remain in the Completed state
Cause
If a pod is in the Completed state, the containers in the pod have completed the startup command and all the processes in the containers have exited.
Problem description
Pods remain in the Completed state.
Solution
View the configurations of the pod and check the startup command that is executed by the containers in the pod. For more information, see Check the configurations of a pod.
Check the logs of the pod and troubleshoot the issue based on the log data. For more information, see Check the logs of a pod.
Pods remain in the Running state but do not run as expected
Cause
The YAML file that is used to deploy the pod contains errors.
Problem description
Pods remain in the Running state but do not run as expected.
Solution
View the configurations of the pod and check whether the containers in the pod are configured as expected. For more information, see Check the configurations of a pod.
Use the following method to check whether the keys of the environment variables contain spelling errors.
The following example describes how to identify spelling errors if you spell command as commnd.
NoteWhen you create a pod, the system ignores the spelling errors in the keys of the environment variables. For example, if you spell command as commnd, you can still use the YAML file to create the pod. However, the pod cannot run the command that contains the spelling error in the YAML file. Instead, the pod runs the default command in the image.
Place
--validate
before thekubectl apply -f
command and run thekubectl apply --validate -f XXX.yaml
command.If you spell command as commnd, the following error occurs:
XXX] unknown field: commnd XXX] this may be a false alarm, see https://gXXXb.XXX/6842pods/test
.Run the following command and compare the pod.yaml file that is generated with the original file that is used to create the pod:
kubectl get pods [$Pod] -o yaml > pod.yaml
Note[$Pod]
is the name of the abnormal pod. You can run thekubectl get pods
command to view the name.If the pod.yaml file contains more lines than the original file that is used to create the pod, the pod is created as expected.
If the pod.yaml file does not contain the command lines in the original file, the original file may contain spelling errors.
Check the logs of the pod and troubleshoot the issue based on the log data. For more information, see Check the logs of a pod.
You can log on to a container in the pod by using the terminal and check the local files in the container. For more information, see Log on to a container by using the terminal.
Pods remain in the Terminating state
Cause
If a pod is in the Terminating state, the pod is being terminated.
Problem description
Pods remain in the Terminating state.
Solution
Pods that remain in the Terminating state are deleted after a period of time. If a pod remains in the Terminating state for a long period of time, you can run the following command to forcefully delete the pod:
kubectl delete pod [$Pod] -n [$namespace] --grace-period=0 --force
Troubleshoot OOM errors in pods
Cause
If the memory usage of a container in the cluster exceeds the specified memory limit, the container may be terminated and trigger an OOM event, which causes the container to exit. For more information about OOM events, see Allocate memory resources to containers and pods.
Problem description
If the terminated process causes the stuck container, the container may restart.
Log on to the ACS console. In the left-side navigation pane, click Clusters.
If you configure alert rules for pod exceptions in the cluster, you can receive alert notifications when an OOM event occurs. For more information about how to configure alert rules, see Alert management.
Solution
Check the time when the error occurs based on the memory usage graph of the pod. For more information, see Check the monitoring information about a pod.
Check whether memory leaks occur in the processes of the pod based on the following monitoring information: the points in time when spikes occur in memory usage, log data, and process names.
If the OOM error is caused by memory leaks, we recommend that you troubleshoot the issue based on your business scenario.
If the processes run as expected, increase the memory limit of the pod. Make sure that the actual memory usage of the pod does not exceed 80% of the memory limit of the pod. For more information, see the Modify the upper and lower limits of CPU and memory resources for a pod section of the "Manage pods" topic.