Data transformation is a feature provided by Alibaba Cloud Log Service. It is used to extract, transform, and load (ETL) row data for structured or unstructured logs in real time. Currently, this feature contains more than 200 operators. This article describes how to use an enrichment function during data transformation in a data enrichment scenario.
The enrichment function mentioned in this article corresponds to the join function in SQL ETL scenarios.
Go to the Log Service console and select a Logstore to go to its query page. Then, you can find the Data Transformation switch. After you enable the switch, you can write code for data transformation.
Overview of data transformation functions: https://www.alibabacloud.com/help/doc-detail/159702.htm
This article uses the scenario of enriching HTTP codes in Nginx logs as an example to describe log enrichment methods in data transformation.
HTTP status codes are common in access logs. If we enrich the HTTP status codes, we can view the status of each request more intuitively and can perform more statistics.
The following mapping table lists the meanings of common HTTP codes.
Assume that the data we want to enrich is a csv file that stores code mappings.
code,alias,category,description
100,1xx,Informational,Continue
101,1xx,Informational,Switching Protocols
...
The code mappings are saved as advanced parameters for data transformation, where the key is http_code and the value is the content of the csv file.
e_table_map(tab_parse_csv("code,alias,category,description\n100,1xx,Informational,Continue\n101,1xx,Informational,Switching Protocols\n102,1xx,Informational,Processing (WebDAV)\n200,2xx,Success,OK\n201,2xx,Success,Created\n202,2xx,Success,Accepted\n203,2xx,Success,Non-Authoritative Information\n204,2xx,Success,No Content\n205,2xx,Success,Reset Content\n206,2xx,Success,Partial Content\n207,2xx,Success,Multi-Status (WebDAV)\n208,2xx,Success,Already Reported (WebDAV)\n226,2xx,Success,IM Used\n300,3xx,Redirection,Multiple Choices\n301,3xx,Redirection,Moved Permanently\n302,3xx,Redirection,Found\n303,3xx,Redirection,See Other\n304,3xx,Redirection,Not Modified\n305,3xx,Redirection,Use Proxy\n306,3xx,Redirection,(Unused)\n307,3xx,Redirection,Temporary Redirect\n308,3xx,Redirection,Permanent Redirect (experimental)\n400,4xx,Client Error,Bad Request\n401,4xx,Client Error,Unauthorized\n402,4xx,Client Error,Payment Required\n403,4xx,Client Error,Forbidden\n404,4xx,Client Error,Not Found\n405,4xx,Client Error,Method Not Allowed\n406,4xx,Client Error,Not Acceptable\n407,4xx,Client Error,Proxy Authentication Required\n408,4xx,Client Error,Request Timeout\n409,4xx,Client Error,Conflict\n410,4xx,Client Error,Gone\n411,4xx,Client Error,Length Required\n412,4xx,Client Error,Precondition Failed\n413,4xx,Client Error,Request Entity Too Large\n414,4xx,Client Error,Request-URI Too Long\n415,4xx,Client Error,Unsupported Media Type\n416,4xx,Client Error,Requested Range Not Satisfiable\n417,4xx,Client Error,Expectation Failed\n418,4xx,Client Error,I'm a teapot (RFC 2324)\n420,4xx,Client Error,Enhance Your Calm (Twitter)\n422,4xx,Client Error,Unprocessable Entity (WebDAV)\n423,4xx,Client Error,Locked (WebDAV)\n424,4xx,Client Error,Failed Dependency (WebDAV)\n425,4xx,Client Error,Reserved for WebDAV\n426,4xx,Client Error,Upgrade Required\n428,4xx,Client Error,Precondition Required\n429,4xx,Client Error,Too Many Requests\n431,4xx,Client Error,Request Header Fields Too Large\n444,4xx,Client Error,No Response (Nginx)\n449,4xx,Client Error,Retry With (Microsoft)\n450,4xx,Client Error,Blocked by Windows Parental Controls (Microsoft)\n451,4xx,Client Error,Unavailable For Legal Reasons\n499,4xx,Client Error,Client Closed Request (Nginx)\n500,5xx,Server Error,Internal Server Error\n501,5xx,Server Error,Not Implemented\n502,5xx,Server Error,Bad Gateway\n503,5xx,Server Error,Service Unavailable\n504,5xx,Server Error,Gateway Timeout\n505,5xx,Server Error,HTTP Version Not Supported\n506,5xx,Server Error,Variant Also Negotiates (Experimental)\n507,5xx,Server Error,Insufficient Storage (WebDAV)\n508,5xx,Server Error,Loop Detected (WebDAV)\n509,5xx,Server Error,Bandwidth Limit Exceeded (Apache)\n510,5xx,Server Error,Not Extended\n511,5xx,Server Error,Network Authentication Required\n598,5xx,Server Error,Network read timeout error\n599,5xx,Server Error,Network connect timeout error\n"),
[("http_code","code")],
[("alias","http_code_alias"), ("description","http_code_desc"),
("category","http_code_category")])
Result:
Assume that the HTTP code mappings are stored in a file, in the following format:
code,alias,category,description
100,1xx,Informational,Continue
101,1xx,Informational,Switching Protocols
...
We can upload the http_code.csv file to Object Storage Service (OSS).
Go to the OSS console at http://oss.console.aliyun.com
Find an existing bucket or create a new bucket and then upload the file by following the instructions in the OSS console.
Enrich the data by using data transformation
e_table_map(
tab_parse_csv(
res_oss_file(endpoint="oss-cn-shanghai-internal.aliyuncs.com",
ak_id=res_local("AK_ID"), ak_key=res_local("AK_KEY"),
bucket="ali-sls-etl-test",
file="http_code.csv", format='text')),
[("http_code","code")],
[("alias","http_code_alias"),
("description","http_code_desc"),
("category","http_code_category")])
We need to define the values that are referenced by res_local in the advanced parameters.
Result:
Assume that the http_code mappings are stored in a MySQL table.
Enrich the data by using data transformation
e_table_map(res_rds_mysql(address="MySQL host address",
username="Username", password="Password",
database="Database name",table="Table name", refresh_interval=300),
[("http_code","code")],
[("alias","http_code_alias"), ("description","http_code_desc"),
("category","http_code_category")])
Note: Data transformation supports Relational Database Service (RDS) on a Virtual Private Cloud (VPC) network. For information about how to configure and enable a VPC network, see https://www.alibabacloud.com/help/doc-detail/162753.htm
Result:
Method | Data Volume Supported | Incremental Update | Batch Update | Applicable Scenarios |
Code embedding | Small | Not supported | Not supported | Simple mappings |
OSS | Large | Not supported | Supported | Relatively static and infrequently updated scenarios |
MySQL | Large | Not supported | Supported | Frequently updated scenarios |
Restriction: No dimension table can exceed 2 GB in size.
Rupal_Click2Cloud - December 5, 2022
Alibaba Cloud Community - November 6, 2023
Alibaba Clouder - July 13, 2021
Alibaba Cloud Community - May 30, 2022
Data Geek - April 29, 2024
Alibaba Cloud Serverless - May 6, 2021
An all-in-one service for log-type data
Learn MoreLog into an artificial intelligence for IT operations (AIOps) environment with an intelligent, all-in-one, and out-of-the-box log management solution
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreBuild business monitoring capabilities with real time response based on frontend monitoring, application monitoring, and custom business monitoring capabilities
Learn MoreMore Posts by Teddy.Sun