When you use Logtail to collect logs from standard Docker containers or Kubernetes containers, errors may occur. This topic describes the operations that you can perform to troubleshoot the errors and check the running status of Logtail.
Check the heartbeat status of a machine group
You can check the heartbeat status of a machine group to check whether Logtail is successfully installed.
Check the heartbeat status of the machine group.
Log on to the Simple Log Service console.
In the Projects section, click the project that you want to manage.
In the left-side navigation pane, choose .
In the Machine Groups list, click the machine group whose heartbeat status you want to check.
On the Machine Group Configurations page, check the machine group status and record the number of nodes whose heartbeat status is OK.
Count the number of worker nodes in the cluster to which your container belongs.
Run the following command to view the number of worker nodes in the cluster:
kubectl get node | grep -v master
The system returns information that is similar to the following code:
NAME STATUS ROLES AGE VERSION cn-hangzhou.i-bp17enxc2us3624wexh2 Ready <none> 238d v1.10.4 cn-hangzhou.i-bp1ad2b02jtqd1shi2ut Ready <none> 220d v1.10.4
Check whether the number of nodes whose heartbeat status is OK is equal to the number of worker nodes in the cluster. Then, select a troubleshooting method based on the check result.
The heartbeat status of all nodes in the machine group is Failed.
If you collect logs from standard Docker containers, check whether the values of the
${your_region_name}
,${your_aliyun_user_id}
, and${your_machine_group_user_defined_id}
parameters are valid. For more information, see Collect logs from standard Docker containers.If you use a Container Service for Kubernetes (ACK) cluster, submit a ticket. For more information, see Install Logtail.
If you use a self-managed Kubernetes cluster, check whether the values of the
{your-project-suffix}
,{regionId}
,{aliuid}
,{access-key-id}
, and{access-key-secret}
parameters are valid. For more information, see Collect text logs from Kubernetes containers in Sidecar mode.If the values are invalid, run the
helm del --purge alibaba-log-controller
command to delete the installation package and then re-install the package.
The number of nodes whose heartbeat status is OK in the machine group is less than the number of worker nodes in the cluster.
Check whether a YAML file is used to deploy the required DaemonSet.
Run the following command. If a response is returned, the DaemonSet is deployed by using the YAML file.
kubectl get po -n kube-system -l k8s-app=logtail
Download the latest version of the Logtail DaemonSet template.
Configure the ${your_region_name}, ${your_aliyun_user_id}, and ${your_machine_group_name} parameters based on your business requirements.
Run the following command to update the YAML file:
kubectl apply -f ./logtail-daemonset.yaml
In other cases, submit a ticket.
Check the collection status of container logs
If no logs exist in the Consumption Preview section or on the query and analysis page of the related Logstore when you query data in the Simple Log Service console, Simple Log Service does not collect logs from your container. In this case, check the status of your container and perform the following operations.
Take note of the following items when you collect logs from container files:
Logtail collects only incremental logs. If a log file on your server is not updated after a Logtail configuration is delivered and applied to the server, Logtail does not collect logs from the file. For more information, see Read log files.
Logtail collects logs only from files in the default storage of containers or in the file systems that are mounted on containers. Other storage methods are not supported.
After logs are collected to a Logstore, you must create indexes. Then, you can query and analyze the logs in the Logstore. For more information, see Create indexes.
Check whether the heartbeat status of your machine group is normal. For more information, see Troubleshoot an error that occurs due to the abnormal heartbeat status of a machine group.
Check whether the Logtail configuration is valid.
Check whether the settings of the following parameters in the Logtail configuration meet your business requirements: IncludeLabel, ExcludeLabel, IncludeEnv, and ExcludeEnv.
NoteContainer labels are retrieved by running the docker inspect command. Container labels are different from Kubernetes labels.
To check whether logs can be collected as expected, you can temporarily remove the settings of the IncludeLabel, ExcludeLabel, IncludeEnv, and ExcludeEnv parameters from the Logtail configuration. If logs can be collected, the preceding parameters are invalid.
Related O&M operations
Log on to a Logtail container
Standard Docker container
Run the following command on your host to query the Logtail container:
docker ps | grep logtail
The system returns information that is similar to the following code:
223****6e registry.cn-hangzhou.aliyuncs.com/log-service/logtail "/usr/local/ilogta..." 8 days ago Up 8 days logtail-iba
Run the following command to start Bash shell in the Logtail container:
docker exec -it 223****6e bash
223****6e
indicates the ID of the Logtail container. Replace the ID with an actual value.
Kubernetes container
Run the following command to query the pods related to Logtail:
kubectl get po -n kube-system | grep logtail
The system returns information that is similar to the following code:
logtail-ds-****d 1/1 Running 0 8d logtail-ds-****8 1/1 Running 0 8d
Run the following command to log on to the required pod:
kubectl exec -it -n kube-system logtail-ds-****d -- bash
logtail-ds-****d
indicates the ID of the pod. Replace the ID with an actual value.
View the operational logs of Logtail
The logs of Logtail are stored in the ilogtail.LOG
and logtail_plugin.LOG
files in the /usr/local/ilogtail/
directory of a Logtail container.
Log on to the Logtail container. For more information, see Log on to a Logtail container.
Go to the /usr/local/ilogtail/ directory.
cd /usr/local/ilogtail
View the ilogtail.LOG and logtail_plugin.LOG files.
cat ilogtail.LOG cat logtail_plugin.LOG
Ignore the stdout of a Logtail container
The stdout of the container does not provide reference for troubleshooting. Ignore the following stdout:
start umount useless mount points, /shm$|/merged$|/mqueue$
umount: /logtail_host/var/lib/docker/overlay2/3fd0043af174cb0273c3c7869500fbe2bdb95d13b1e110172ef57fe840c82155/merged: must be superuser to unmount
umount: /logtail_host/var/lib/docker/overlay2/d5b10aa19399992755de1f85d25009528daa749c1bf8c16edff44beab6e69718/merged: must be superuser to unmount
umount: /logtail_host/var/lib/docker/overlay2/5c3125daddacedec29df72ad0c52fac800cd56c6e880dc4e8a640b1e16c22dbe/merged: must be superuser to unmount
......
xargs: umount: exited with status 255; aborting
umount done
start logtail
ilogtail is running
logtail status:
ilogtail is running
View the status of Logtail components in a Kubernetes cluster
Run the following command to view the status and information of the alibaba-log-controller
Deployment:
kubectl get deploy alibaba-log-controller -n kube-system
Result:
NAME READY UP-TO-DATE AVAILABLE AGE
alibaba-log-controller 1/1 1 1 11d
Run the following command to view the status and information of the logtail-ds DaemonSet:
kubectl get ds logtail-ds -n kube-system
Result:
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
logtail-ds 2 2 2 2 2 **ux 11d
View the version number, IP address, and startup time of Logtail
Run the following command on your host to view the version number, IP address, and startup time of Logtail.
The related information is stored in the
/usr/local/ilogtail/app_info.json
file of your Logtail container.kubectl exec logtail-ds-****k -n kube-system cat /usr/local/ilogtail/app_info.json
The system returns information that is similar to the following code:
{ "UUID" : "", "hostname" : "logtail-****k", "instance_id" : "0EB****_172.20.4.2_1517810940", "ip" : "172.20.4.2", "logtail_version" : "0.16.2", "os" : "Linux; 3.10.0-693.2.2.el7.x86_64; #1 SMP Tue Sep 12 22:26:13 UTC 2017; x86_64", "update_time" : "2018-02-05 06:09:01" }
Handle the issue that a CRD-specified Logstore is accidentally deleted
You can use a Custom Resource Definition (CRD) to create a Logtail configuration. If you delete the Logstore that is specified in the CRD, the collected data cannot be restored, and the Logtail configuration becomes invalid. You can use one of the following methods to prevent errors in the preceding scenario:
Change the Logstore specified in the CRD.
Restart the alibaba-log-controller-related pod.
Run the following command to query the pod:
kubectl get po -n kube-system | grep alibaba-log-controller