The data transformation feature of Simple Log Service allows you to obtain data from ApsaraDB RDS for MySQL databases and enrich the data based on data transformation rules.
Background information
When you analyze data, you may need to obtain data from different storage sources. For example, the data of user operations and user behavior is stored in Simple Log Service, and the data of user properties and registration is stored in an ApsaraDB RDS for MySQL database. In this case, you can use the data transformation feature to obtain data from the database and store the data in a Logstore.
You can use the res_rds_mysql function to obtain data from an ApsaraDB RDS for MySQL database and then use the e_table_map or e_search_table_map function to enrich the data.
The instance on which your ApsaraDB RDS for MySQL database is created must reside in the same region as your Simple Log Service project. Otherwise, you cannot obtain data from the database.
You can access and obtain data from an ApsaraDB RDS for MySQL database by using an internal endpoint of the instance on which the database is created. For more information, see Obtain data from an ApsaraDB RDS for MySQL database over the internal network.
Use the e_table_map function to enrich data
In this example, the e_table_map and res_rds_mysql functions are used to enrich data.
Raw data
Sample data records in a table of an ApsaraDB RDS for MySQL database
province
city
population
cid
eid
Shanghai
Shanghai
2000
1
00001
Tianjin
Tianjin
800
1
00002
Beijing
Beijing
4000
1
00003
Henan
Zhengzhou
3000
2
00004
Jiangsu
Nanjing
1500
2
00005
Sample logs in a Simple Log Service Logstore
time:"1566379109" data:"test-one" cid:"1" eid:"00001" time:"1566379111" data:"test_second" cid:"1" eid:"12345" time:"1566379111" data:"test_three" cid:"2" eid:"12345" time:"1566379113" data:"test_four" cid:"2" eid:"12345"
Transformation rule
You can configure a transformation rule to match the cid field in the Logstore against the cid field in the table. If a value of the cid field is the same between the Logstore and table, a log matches a data record. Then, the system returns the province, city, and population fields and the field values for the matched data record in the table, and concatenates the returned data with the matched log in the Logstore to generate a new log.
NoteIf multiple values of a field are matched in the table, the e_table_map function obtains only the first data record. In this example, the cid field in the table has multiple values of 1.
The e_table_map function supports only single-row matching. If you want to implement multi-row matching and combine the matched data into a new log, you can use the e_search_table_map function. For more information, see Use the e_search_map_table function to enrich data.
e_table_map(res_rds_mysql(address="rds-host", username="mysql-username",password="xxx",database="xxx",table="xx",refresh_interval=60),"cid",["province","city","population"])
For more information about how to configure an ApsaraDB RDS for MySQL database in the res_rds_mysql function, see res_rds_mysql.
Transformation result
time:"1566379109" data:"test-one" cid:"1" eid:"00001" province:"Shanghai" city:"Shanghai" population:"2000" time:"1566379111" data:"test_second" cid:"1" eid:"12345" province:"Shanghai" city:"Shanghai" population:"2000" time:"1566379111" data:"test_three" cid:"2" eid:"12345" province:"Henan" city:"Zhengzhou" population:"3000" time:"1566379113" data:"test_four" cid:"2" eid:"12345" province:"Henan" city:"Zhengzhou" population:"3000"
Use the e_search_map_table function to enrich data
In this example, the e_search_map_table and res_rds_mysql functions are used to enrich data.
Raw data
Sample data records in a table of an ApsaraDB RDS for MySQL database
content
name
age
city~=n*
aliyun
10
province~=su$
Maki
18
city:nanjing
vicky
20
Sample log in a Simple Log Service Logstore
time:1563436326 data:123 city:nanjing province:jiangsu
Transformation rule
You can configure a transformation rule to match the values of the content field in the table against the log in the Logstore. The values are key-value pairs. A key corresponds to a field name in the log. A value corresponds to a field value in the log and is a regular expression. The system concatenates the related fields and field values in the table based on the matching result with the log to generate a new log.
NoteFor more information about how to configure an ApsaraDB RDS for MySQL database in the res_rds_mysql function, see res_rds_mysql.
The content field is included in the table. When the system matches the values of the field against the log, various matching modes are supported, such as regular expression match, exact match, and fuzzy match. For more information about matching rules, see e_search.
Single-row matching
The system returns the transformation result when one data record in the table matches the log.
e_search_table_map(res_rds_mysql(address="rds-host", username="mysql-username",password="xxx",database="xxx",table="xx",refresh_interval=60),"content","name")
Multi-row matching
The system traverses all data records in the table and adds all matched data to the specified field.
NoteThe following parameter settings are required:
multi_match=True
: enables multi-row matching.multi_join=,"
: concatenates multiple matched values with commas (,).
e_search_table_map(res_rds_mysql(address="rds-host", username="mysql-username",password="xxx",database="xxx",table="xx",refresh_interval=60),"content","name",multi_match=True,multi_join=",")
Transformation result
Single-row matching
In this example, the system checks whether the value of the city field in the log matches the n* expression. If the match is successful, the system returns the name field and field value for the matched data record in the table to generate a new log.
time:1563436326 data:123 city:nanjing province:jiangsu name:aliyun
Multi-row matching
In this example, the system checks whether the value of the city field in the log matches the n* expression, whether the value of the province field in the log matches the su$ expression, and whether the value of the city field in the log includes nanjing. In this example, a regular expression is preceded by ~=. The colon (:) indicates whether the followed string is included. If the match is successful, the system returns the name field and three values of the field in the table, and concatenates the returned data with the log to generate a new log. The values are separated by commas (,).
time:1563436326 data:123 city:nanjing province:jiangsu name:aliyun,Maki,vicky