All Products
Search
Document Center

Simple Log Service:How do I check the completeness of data that is shipped from Simple Log Service to MaxCompute?

Last Updated:Aug 23, 2024

After Simple Log Service ships data to a MaxCompute table, you must check the completeness of data by partition. The following sections describe the methods that you can use to check whether the data in a partition is complete.

Method 1: Use the reserved field __partition_time__ as a partition key column

__partition_time__ is obtained from the time field of a log. The value of the time field is the timestamp of a log. The timestamp is rounded down to the nearest hour to generate the value of __partition_time__ based on the format that you specify for time. The timestamp of a log is neither the time at which the log is shipped nor the time at which the log is written to Simple Log Service.

For example, the timestamp of a log is 2017-05-19 10:43:00, the format that you specify for __partition_time__ is yyyy_MM_dd_HH_mm, and logs are shipped at 1-hour intervals. In this example, the log is stored in the 2017_05_19_10_00 partition after the log is shipped to MaxCompute regardless of the time at which the log is written to Simple Log Service. For more information about the calculation details, see Ship logs to MaxCompute (old version).

If logs are written to Simple Log Service in real time and no historical logs are written to Simple Log Service, you can use one of the following methods to check whether partition data is complete:

  • (Recommended) Use API operations, SDKs, or the Simple Log Service console to check data completeness

    Use an GetShipperStatus, SDK, or the Simple Log Service console to obtain data shipping tasks from a project or Logstore that you specify. The following example shows the tasks that are returned by an API operation. The Simple Log Service console can visualize the returned tasks.

    {
      "count" : 10,
      "total" : 20,
      "statistics" : {
          "running" : 0,
          "success" : 20,
          "fail" : 0 
      }
      "tasks" : [
          ...
          {
              "id" : "abcdefghijk",
              "taskStatus" : "success",
              "taskMessage" : "",
              "taskCreateTime" : 1448925013,
              "taskLastDataReceiveTime" : 1448915013,
              "taskFinishTime" : 1448926013
          },
          {
              "id" : "xfegeagege",
              "taskStatus" : "success",
              "taskMessage" : "",
              "taskCreateTime" : 1448926813,
              "taskLastDataReceiveTime" : 1448930000,
              "taskFinishTime" : 1448936910
          }
      ]
    }

    taskLastDataReceiveTime indicates the time at which Simple Log Service receives data. You can check whether the data that Simple Log Service receives before Time T is shipped to the MaxCompute table that is used.

    • If the time indicated by taskLastDataReceiveTime is earlier than the time indicated by T plus 300s, and each shipping task that is finished before the time indicated by T plus 300s is in the success state, the data that Simple Log Service receives before Time T is shipped to the MaxCompute table. A 300-second time offset is used for fault tolerance. For example, an error occurs in sending data to Simple Log Service and a retry is performed.

    • If a shipping task is in the ready or running state, the data that you ship to the MaxCompute table is not complete. In this case, you need to wait until the task is finished.

    • If a shipping task is in the failed state, you need to troubleshoot the failure and retry the task. If the failure is related to shipping configurations, you can change the configurations.

  • Roughly evaluate data completeness based on MaxCompute partitions

    For example, the MaxCompute table is partitioned at 30-minute intervals, and data is shipped from Simple Log Service to MaxCompute at 30-minute intervals. The MaxCompute table contains the following partitions:

    2017_05_19_10_00
    2017_05_19_10_30

    If the 2017_05_19_11_00 partition is found in the MaxCompute table, the data that is stored in the 2017_05_19_10_00 and 2017_05_19_10_30 partitions is complete.

    This method is easy to use and does not require API calls. However, the evaluation results may be inaccurate.

Method 2: Use a custom log field as a partition key column

For example, a log contains a date field, and the values of the date field are 20170518 and 20170519. When you configure a data shipping rule, the date field is mapped to a partition key column.

In this example, you must take note of the difference between the values of the date field and the time at which data is written to Simple Log Service. Then, check whether partition data is complete based on Method 1 and the time at which data is written to Simple Log Service.

The shipping task is successful but the data in the MaxCompute table is not complete. What do I do?

The issue may be caused by the following reasons:

  • The Simple Log Service field that is mapped to a partition key column does not exist. In this case, the values in the partition key column are null, which are not allowed in the MaxCompute table.

  • The values of the Simple Log Service field that is mapped to a partition key column contain forward slashes (/) or other special characters. These characters are reserved words in MaxCompute and are not allowed in the partition key column.

If one of the preceding reasons exists, the system skips the related logs and continues to ship other logs. If the partition key columns for the other logs encounter no mapping errors, these logs are successfully shipped to the partitions.

To solve this issue, you can modify the configurations of the partition key column. We recommend that you use the reserved field __partition_time__ as a partition key column.

For more information, see Limits.