By Gaoxun
There is such a scenario where we need to convert the self_code_list in Table A into the a_tag_list. Note that self_code has a one-to-one mapping relationship with a_tag and this mapping relationship should be represented in Table B. Generally, the conversion of the mapping relationship is solved by JOIN (you can leave a comment since a better method or any idea is welcome).
Figure 1 Table A
Figure 2 Table B
The current data structure is certainly not easy to handle. What if we convert it to the one shown in Figure 3:
Figure 3
With Figure 3, we can directly associate with Table B through self_code to obtain the value of a_tag as shown in Figure 4, which is clear and simple to operate.
Figure 4
Here, the key issue is how to convert Table A into a table shown in Figure 3. In fact, this operation is essentially to transpose columns to rows. Specifically, it is to expand a column (usually an array) or several columns of a row of data and select a column or several columns as the expansion key to convert a row of data into multiple rows. In the previous case of converting Table A from Figure 1 to Figure 3, we use id and name as keys to expand the column self_code_list into multiple rows.
In ODPS, two built-in functions can help us easily transpose columns to rows:
https://www.alibabacloud.com/help/en/maxcompute/user-guide/trans-array
https://www.alibabacloud.com/help/en/maxcompute/user-guide/lateral-view
SELECT TRANS_ARRAY(2,',',id,name,self_code_list) AS (id,name,self_code)
FROM (
SELECT id,name
,ARRAY_JOIN(FROM_JSON(JSON_FORMAT(self_code_list),"array<string>"),',')
AS self_code_list
FROM TABLE_A
ORDER BY id ASC
)
In Table A, the self_code_list field type is JSON, while TRANS_ARRAY requires that the type of the column to be converted into rows must be STRING, so the self_code_list should be converted into STRING type first.
We will use an example to explain the parameters of this function:
trans_array (<num_keys>, <separator>, <key1>,<key2>,…,<col1>,<col2>,<col3>) as (<key1>,<key2>,...,<col1>, <col2>)
The first parameter is the number of columns that are used as transposed keys when being converted to multiple rows. In this example, we use id and name as keys, so the value of this parameter is 2.
The second parameter is used to split a string into multiple elements, that is, the separator that changes one row into multiple elements. This parameter is set according to the specific separator, usually a comma () or semicolon (;).
The remaining parameters are column names of STRING type. The function determines that the last M columns are the array to be transposed to rows and that the first N columns are used as keys during transpose. In this example, the column name parameters are id, name, and slef_code_list. With num_key = 2, the id and name columns in the result set will be used as keys, while self_code_list is the column to be transposed.
SELECT id
,name
,self_code
FROM TABLE_A
LATERAL VIEW EXPLODE(FROM_JSON(JSON_FORMAT(self_code_list),"array<string>")) tmp AS self_code;
Note that the input parameter of the EXPLODE function must be ARRAY.
Both methods can transpose columns to rows, but there are slight differences in handling empty columns.
Look at these original data:
SELECT id, name, self_code_list from TABLE_A
where id IN (291, 112, 116, 252);
Figure 5
Convert the four pieces of data in two methods respectively.
Figure 6
Figure 7
It can be seen that the result set does not retain empty rows with the LATERAW VIEW EXPLODE function, while the TRANS_ARRAY function retains empty rows.
Since we've introduced how to transpose columns to rows, then let's focus on how to transpose rows to columns. Do you remember our initial goal? We are going to map the self_code_list of Table A to a_tag_list, as shown in Figure 8. After the previous operation to transpose the column to rows, we can easily associate with the table TABLE and obtain the temporary table shown in Figure 4.
Figure 8
The operation from Figure 4 to Figure 8 is to transpose rows to columns, that is, to convert multiple rows of data into one or several columns. Of course, this is not an arbitrary operation. Instead, we still need to convert it according to the key, which is similar to the process of transposing columns to rows. Transposing columns to rows is the reverse operation of transposing rows to columns. When transposing columns to rows, we split the value of a column in each row into multiple values and then into multiple rows according to the key. When transposing rows to columns, we splice a column of multiple rows of data into one data and then into a row according to the key. In the process from Figure 4 to Figure 8 in this example, we use id and name as keys to splice the column a_tag with commas (), and then use id, name, and a_tag_list to form a unique row. Of course, it can also be converted into multiple columns. We only need to specify the method to differentiate columns when splicing and perform the SPLIT operation on the column values. Then, multiple columns can be obtained. This splicing method is WM_CONCAT.
https://www.alibabacloud.com/help/en/maxcompute/user-guide/wm-concat
In the above example, we use the WM_CONCAT function like this:
SELECT id
,name
,WM_CONCAT(',',a_tag) a_tag
from
T_tmp_4;
Therefore, we can obtain the result set shown in Figure 8.
With the proceeding operations, we have transposed the table columns to rows and rows to columns and finally achieved our goal. For more information about how to transpose rows to columns and columns to rows, please refer to MaxCompute official documentation.
Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.
Alibaba Cloud’s AI Technology Sparks Breakthrough in RNA Virus Discovery
1,037 posts | 255 followers
FollowAlibaba Cloud MaxCompute - February 18, 2024
Alibaba Cloud MaxCompute - January 22, 2024
Alibaba Cloud MaxCompute - December 18, 2018
Alibaba Cloud MaxCompute - August 27, 2021
Alibaba Cloud MaxCompute - February 7, 2024
Alibaba Cloud MaxCompute - January 29, 2024
1,037 posts | 255 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreRealtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn MoreMore Posts by Alibaba Cloud Community