×
Community Blog Data Transformation Practices for Enriching Log Data

Data Transformation Practices for Enriching Log Data

This article describes how to use the data transformation feature of Alibaba Cloud Log Service to enrich log fields and provide more useful service information for log analysis.

Overview of Data Transformation

Data transformation is a feature provided by Alibaba Cloud Log Service. It is used to extract, transform, and load (ETL) row data for structured or unstructured logs in real time. Currently, this feature contains more than 200 operators. This article describes how to use an enrichment function during data transformation in a data enrichment scenario.

The enrichment function mentioned in this article corresponds to the join function in SQL ETL scenarios.

1

Data Transformation Entry

Go to the Log Service console and select a Logstore to go to its query page. Then, you can find the Data Transformation switch. After you enable the switch, you can write code for data transformation.

2

Overview of data transformation functions: https://www.alibabacloud.com/help/doc-detail/159702.htm

Scenario

This article uses the scenario of enriching HTTP codes in Nginx logs as an example to describe log enrichment methods in data transformation.

HTTP status codes are common in access logs. If we enrich the HTTP status codes, we can view the status of each request more intuitively and can perform more statistics.

3

The following mapping table lists the meanings of common HTTP codes.

4

Enriching Log Data by Using Data Transformation

Method 1 - Use res_local Advanced Parameters

Assume that the data we want to enrich is a csv file that stores code mappings.

code,alias,category,description
100,1xx,Informational,Continue
101,1xx,Informational,Switching Protocols
...

The code mappings are saved as advanced parameters for data transformation, where the key is http_code and the value is the content of the csv file.

e_table_map(tab_parse_csv("code,alias,category,description\n100,1xx,Informational,Continue\n101,1xx,Informational,Switching Protocols\n102,1xx,Informational,Processing (WebDAV)\n200,2xx,Success,OK\n201,2xx,Success,Created\n202,2xx,Success,Accepted\n203,2xx,Success,Non-Authoritative Information\n204,2xx,Success,No Content\n205,2xx,Success,Reset Content\n206,2xx,Success,Partial Content\n207,2xx,Success,Multi-Status (WebDAV)\n208,2xx,Success,Already Reported (WebDAV)\n226,2xx,Success,IM Used\n300,3xx,Redirection,Multiple Choices\n301,3xx,Redirection,Moved Permanently\n302,3xx,Redirection,Found\n303,3xx,Redirection,See Other\n304,3xx,Redirection,Not Modified\n305,3xx,Redirection,Use Proxy\n306,3xx,Redirection,(Unused)\n307,3xx,Redirection,Temporary Redirect\n308,3xx,Redirection,Permanent Redirect (experimental)\n400,4xx,Client Error,Bad Request\n401,4xx,Client Error,Unauthorized\n402,4xx,Client Error,Payment Required\n403,4xx,Client Error,Forbidden\n404,4xx,Client Error,Not Found\n405,4xx,Client Error,Method Not Allowed\n406,4xx,Client Error,Not Acceptable\n407,4xx,Client Error,Proxy Authentication Required\n408,4xx,Client Error,Request Timeout\n409,4xx,Client Error,Conflict\n410,4xx,Client Error,Gone\n411,4xx,Client Error,Length Required\n412,4xx,Client Error,Precondition Failed\n413,4xx,Client Error,Request Entity Too Large\n414,4xx,Client Error,Request-URI Too Long\n415,4xx,Client Error,Unsupported Media Type\n416,4xx,Client Error,Requested Range Not Satisfiable\n417,4xx,Client Error,Expectation Failed\n418,4xx,Client Error,I'm a teapot (RFC 2324)\n420,4xx,Client Error,Enhance Your Calm (Twitter)\n422,4xx,Client Error,Unprocessable Entity (WebDAV)\n423,4xx,Client Error,Locked (WebDAV)\n424,4xx,Client Error,Failed Dependency (WebDAV)\n425,4xx,Client Error,Reserved for WebDAV\n426,4xx,Client Error,Upgrade Required\n428,4xx,Client Error,Precondition Required\n429,4xx,Client Error,Too Many Requests\n431,4xx,Client Error,Request Header Fields Too Large\n444,4xx,Client Error,No Response (Nginx)\n449,4xx,Client Error,Retry With (Microsoft)\n450,4xx,Client Error,Blocked by Windows Parental Controls (Microsoft)\n451,4xx,Client Error,Unavailable For Legal Reasons\n499,4xx,Client Error,Client Closed Request (Nginx)\n500,5xx,Server Error,Internal Server Error\n501,5xx,Server Error,Not Implemented\n502,5xx,Server Error,Bad Gateway\n503,5xx,Server Error,Service Unavailable\n504,5xx,Server Error,Gateway Timeout\n505,5xx,Server Error,HTTP Version Not Supported\n506,5xx,Server Error,Variant Also Negotiates (Experimental)\n507,5xx,Server Error,Insufficient Storage (WebDAV)\n508,5xx,Server Error,Loop Detected (WebDAV)\n509,5xx,Server Error,Bandwidth Limit Exceeded (Apache)\n510,5xx,Server Error,Not Extended\n511,5xx,Server Error,Network Authentication Required\n598,5xx,Server Error,Network read timeout error\n599,5xx,Server Error,Network connect timeout error\n"),
              [("http_code","code")],
              [("alias","http_code_alias"), ("description","http_code_desc"), 
              ("category","http_code_category")])

Result:

5

Method 2: Use an OSS File

Assume that the HTTP code mappings are stored in a file, in the following format:

code,alias,category,description
100,1xx,Informational,Continue
101,1xx,Informational,Switching Protocols
...

We can upload the http_code.csv file to Object Storage Service (OSS).

Go to the OSS console at http://oss.console.aliyun.com

Find an existing bucket or create a new bucket and then upload the file by following the instructions in the OSS console.

6

Enrich the data by using data transformation

e_table_map(
      tab_parse_csv(
           res_oss_file(endpoint="oss-cn-shanghai-internal.aliyuncs.com",
              ak_id=res_local("AK_ID"), ak_key=res_local("AK_KEY"), 
              bucket="ali-sls-etl-test", 
              file="http_code.csv", format='text')),
              [("http_code","code")],
              [("alias","http_code_alias"),
               ("description","http_code_desc"),
               ("category","http_code_category")])

We need to define the values that are referenced by res_local in the advanced parameters.

7

Result:

8

Method 3: Use a MySQL Table

Assume that the http_code mappings are stored in a MySQL table.

9

Enrich the data by using data transformation

e_table_map(res_rds_mysql(address="MySQL host address", 
                  username="Username", password="Password",
                  database="Database name",table="Table name", refresh_interval=300),
              [("http_code","code")],
              [("alias","http_code_alias"), ("description","http_code_desc"), 
              ("category","http_code_category")])

Note: Data transformation supports Relational Database Service (RDS) on a Virtual Private Cloud (VPC) network. For information about how to configure and enable a VPC network, see https://www.alibabacloud.com/help/doc-detail/162753.htm

Result:

10

Conclusion

Overall Process

11

Method Comparison

Method Data Volume Supported Incremental Update Batch Update Applicable Scenarios
Code embedding Small Not supported Not supported Simple mappings
OSS Large Not supported Supported Relatively static and infrequently updated scenarios
MySQL Large Not supported Supported Frequently updated scenarios

Restriction: No dimension table can exceed 2 GB in size.

References

0 0 0
Share on

Teddy.Sun

2 posts | 0 followers

You may also like

Comments

Teddy.Sun

2 posts | 0 followers

Related Products