Troubleshoot errors that occur during container log collection - Simple Log Service

When you use Logtail to collect logs from standard Docker containers or Kubernetes containers, errors may occur. This topic describes the operations that you can perform to troubleshoot the errors and check the running status of Logtail.

Check the heartbeat status of a machine group

You can check the heartbeat status of a machine group to check whether Logtail is successfully installed.

Check the heartbeat status of the machine group.
1. Log on to the Simple Log Service console.
2. In the Projects section, click the project that you want to manage.
3. In the left-side navigation pane, choose Resources > Machine Groups.
4. In the Machine Groups list, click the machine group whose heartbeat status you want to check.
5. On the Machine Group Configurations page, check the machine group status and record the number of nodes whose heartbeat status is OK.

Count the number of worker nodes in the cluster to which your container belongs.

Connect to the cluster.

Run the following command to view the number of worker nodes in the cluster:

kubectl get node | grep -v master

The system returns information that is similar to the following code:

NAME                                 STATUS    ROLES     AGE       VERSION
cn-hangzhou.i-bp17enxc2us3624wexh2   Ready     <none>    238d      v1.10.4
cn-hangzhou.i-bp1ad2b02jtqd1shi2ut   Ready     <none>    220d      v1.10.4

Check whether the number of nodes whose heartbeat status is OK is equal to the number of worker nodes in the cluster. Then, select a troubleshooting method based on the check result.
- The heartbeat status of all nodes in the machine group is Failed.
  - If you collect logs from standard Docker containers, check whether the values of the ${your_region_name}, ${your_aliyun_user_id}, and ${your_machine_group_user_defined_id} parameters are valid. For more information, see Collect logs from standard Docker containers.
  - If you use a Container Service for Kubernetes (ACK) cluster, submit a ticket. For more information, see Install Logtail.
  - If you use a self-managed Kubernetes cluster, check whether the values of the {your-project-suffix}, {regionId}, {aliuid}, {access-key-id}, and {access-key-secret} parameters are valid. For more information, see Collect text logs from Kubernetes containers in Sidecar mode.
    If the values are invalid, run the helm del --purge alibaba-log-controller command to delete the installation package and then re-install the package.
- The number of nodes whose heartbeat status is OK in the machine group is less than the number of worker nodes in the cluster.
  - Check whether a YAML file is used to deploy the required DaemonSet.
    1. Run the following command. If a response is returned, the DaemonSet is deployed by using the YAML file.
      kubectl get po -n kube-system -l k8s-app=logtail
    2. Download the latest version of the Logtail DaemonSet template.
    3. Configure the ${your_region_name}, ${your_aliyun_user_id}, and ${your_machine_group_name} parameters based on your business requirements.
    4. Run the following command to update the YAML file:
      kubectl apply -f ./logtail-daemonset.yaml
  - In other cases, submit a ticket.

Check the collection status of container logs

If no logs exist in the Consumption Preview section or on the query and analysis page of the related Logstore when you query data in the Simple Log Service console, Simple Log Service does not collect logs from your container. In this case, check the status of your container and perform the following operations.

Important

Take note of the following items when you collect logs from container files:
- Logtail collects only incremental logs. If a log file on your server is not updated after a Logtail configuration is delivered and applied to the server, Logtail does not collect logs from the file. For more information, see Read log files.
- Logtail collects logs only from files in the default storage of containers or in the file systems that are mounted on containers. Other storage methods are not supported.
After logs are collected to a Logstore, you must create indexes. Then, you can query and analyze the logs in the Logstore. For more information, see Create indexes.

Check whether the heartbeat status of your machine group is normal. For more information, see Troubleshoot an error that occurs due to the abnormal heartbeat status of a machine group.
Check whether the Logtail configuration is valid.
Check whether the settings of the following parameters in the Logtail configuration meet your business requirements: IncludeLabel, ExcludeLabel, IncludeEnv, and ExcludeEnv.
Note
Container labels are retrieved by running the docker inspect command. Container labels are different from Kubernetes labels.
To check whether logs can be collected as expected, you can temporarily remove the settings of the IncludeLabel, ExcludeLabel, IncludeEnv, and ExcludeEnv parameters from the Logtail configuration. If logs can be collected, the preceding parameters are invalid.

Related O&M operations

Log on to a Logtail container

Standard Docker container
1. Run the following command on your host to query the Logtail container:
```
docker ps | grep logtail
```
  The system returns information that is similar to the following code:
```
223****6e        registry.cn-hangzhou.aliyuncs.com/log-service/logtail                             "/usr/local/ilogta..."   8 days ago          Up 8 days                               logtail-iba
```
2. Run the following command to start Bash shell in the Logtail container:
```
docker exec -it 223****6e  bash
```
  223****6e indicates the ID of the Logtail container. Replace the ID with an actual value.

Kubernetes container

Run the following command to query the pods related to Logtail:

kubectl get po -n kube-system | grep logtail

The system returns information that is similar to the following code:

logtail-ds-****d                                             1/1       Running    0          8d
logtail-ds-****8                                             1/1       Running    0          8d

Run the following command to log on to the required pod:
```
kubectl exec -it -n kube-system logtail-ds-****d -- bash
```
logtail-ds-****d indicates the ID of the pod. Replace the ID with an actual value.

View the operational logs of Logtail

The logs of Logtail are stored in the ilogtail.LOG and logtail_plugin.LOG files in the /usr/local/ilogtail/ directory of a Logtail container.

Log on to the Logtail container. For more information, see Log on to a Logtail container.
Go to the /usr/local/ilogtail/ directory.
```
cd /usr/local/ilogtail
```
View the ilogtail.LOG and logtail_plugin.LOG files.
```
cat ilogtail.LOG
cat logtail_plugin.LOG
```

Ignore the stdout of a Logtail container

The stdout of the container does not provide reference for troubleshooting. Ignore the following stdout:

start umount useless mount points, /shm$|/merged$|/mqueue$
umount: /logtail_host/var/lib/docker/overlay2/3fd0043af174cb0273c3c7869500fbe2bdb95d13b1e110172ef57fe840c82155/merged: must be superuser to unmount
umount: /logtail_host/var/lib/docker/overlay2/d5b10aa19399992755de1f85d25009528daa749c1bf8c16edff44beab6e69718/merged: must be superuser to unmount
umount: /logtail_host/var/lib/docker/overlay2/5c3125daddacedec29df72ad0c52fac800cd56c6e880dc4e8a640b1e16c22dbe/merged: must be superuser to unmount
......
xargs: umount: exited with status 255; aborting
umount done
start logtail
ilogtail is running
logtail status:
ilogtail is running

View the status of Logtail components in a Kubernetes cluster

Run the following command to view the status and information of the alibaba-log-controller Deployment:

kubectl get deploy alibaba-log-controller -n kube-system

Result:

NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
alibaba-log-controller   1/1     1            1           11d

Run the following command to view the status and information of the logtail-ds DaemonSet:

kubectl get ds logtail-ds -n kube-system

Result:

NAME         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR  AGE
logtail-ds   2         2         2       2            2           **ux           11d

View the version number, IP address, and startup time of Logtail

Run the following command on your host to view the version number, IP address, and startup time of Logtail.

The related information is stored in the /usr/local/ilogtail/app_info.json file of your Logtail container.

kubectl exec logtail-ds-****k -n kube-system cat /usr/local/ilogtail/app_info.json

The system returns information that is similar to the following code:

{
   "UUID" : "",
   "hostname" : "logtail-****k",
   "instance_id" : "0EB****_172.20.4.2_1517810940",
   "ip" : "172.20.4.2",
   "logtail_version" : "0.16.2",
   "os" : "Linux; 3.10.0-693.2.2.el7.x86_64; #1 SMP Tue Sep 12 22:26:13 UTC 2017; x86_64",
   "update_time" : "2018-02-05 06:09:01"
}

Handle the issue that a CRD-specified Logstore is accidentally deleted

You can use a Custom Resource Definition (CRD) to create a Logtail configuration. If you delete the Logstore that is specified in the CRD, the collected data cannot be restored, and the Logtail configuration becomes invalid. You can use one of the following methods to prevent errors in the preceding scenario:

Change the Logstore specified in the CRD.
Restart the alibaba-log-controller-related pod.
Run the following command to query the pod:
```
kubectl get po -n kube-system | grep alibaba-log-controller
```