Use Log Service to collect the logs of Spark jobs - E-MapReduce

This topic describes how to use Log Service to collect the logs of Spark jobs.

Prerequisites

A Spark cluster is created on the EMR on ACK page of the new EMR console. For more information, see Get started with EMR on ACK.
Log Service is activated. For more information, see Getting Started.

Procedure

Install Logtail. For more information, see Step 1: Install Logtail.
Note
If Logtail is already installed, skip this step and go to Step 2.
Access the project that you want to manage in the Log Service console.
1. Log on to the ACK console.
2. On the Clusters page, find the cluster that you want to manage and click the name of the cluster in the Cluster Name/ID column or click Details in the Actions column.
3. In the Cluster Resources area on the Basic Information tab,click the link to the right of Log Service Project.
  The details page of the project appears.
On the Logstores tab, create two Logstores.
In this example, the two Logstores are named spark-driver-log and spark-executor-log. For more information about how to create a Logstore, see Step 1: Create a project and a Logstore.
In the spark-driver-log Logstore, perform the following operations:
1. Configure Logtail. Select Kubernetes - Standard Output for Data Import and select an existing Kubernetes machine group.
2. Click the arrow to the left of the spark-driver-log Logstore. In the left-side navigation pane, choose Data Import > Logtail Configurations and select an existing Kubernetes machine group.
3. In the Plug-in Config field, enter the following code:
```
{
    "inputs": [
        {
            "detail": {
                "IncludeEnv": {
                    "SPARKLOGENV": "spark-driver"
                },
                "Stderr": true,
                "Stdout": true,
                "BeginLineCheckLength": 10,
                "BeginLineRegex": "\\d+/\\d+/\\d+.*"
            },
            "type": "service_docker_stdout"
        }
    ]
}
```

In the spark-executor-log Logstore, enter the following code in the Plug-in Config field based on the operations in Step 4.

{
    "inputs": [
        {
            "detail": {
                "IncludeEnv": {
                    "SPARKLOGENV": "spark-executor"
                },
                "Stderr": true,
                "Stdout": true,
                "BeginLineCheckLength": 10,
                "BeginLineRegex": "\\d+/\\d+/\\d+.*"
            },
            "type": "service_docker_stdout"
        }
    ]
}

Enable the indexing feature for the Logstores. For more information, see Create indexes.
After you complete the preceding steps, you can use Log Service to query the logs of Spark jobs.