By Haiqing
MaxCompute (formerly known as ODPS) is a leading distributed big data processing platform developed by Alibaba Cloud. It is widely utilized, especially within the Alibaba Group, and supports the core businesses of multiple business units (BUs). ODPS V2.0, continuously optimizing performance, aims to enhance the user experience and expression capabilities of SQL, as well as improve the productivity of ODPS developers.
Building upon the SQL engine of ODPS V2.0, MaxCompute simplifies SQL compilation and enhances the language's expressiveness. We present a series of articles titled Unleash the Power of MaxCompute to explore the capabilities of MaxCompute (ODPS V2.0).
MaxCompute supports the new syntax - PIVOT and UNPIVOT. PIVOT enables you to transform rows with specified values into columns by aggregating data based on those values, whereas UNPIVOT allows for the conversion of columns into rows. Here are some common use cases:
• Scenario 1
For a business table, you need to treat the values in the table as new columns and aggregate the existing results based on each value, in order to achieve the conversion from rows to columns. Before PIVOT was supported, you had to combine the GROUP BY clause, aggregate functions, and Filter syntax to meet this requirement.
• Scenario 2
For a business table, you need to create a new column and merge the original column names into this column. Another new column is used to place the values of the original columns, achieving the conversion from columns to rows. Before UNPIVOT was supported, you had to combine the CROSS JOIN syntax with the CASE WHEN expression to fulfill this requirement.
PIVOT is used to rotate the specified value of a single column to multiple columns. It also runs aggregations when required on the remaining column values. The PIVOT syntax is part of the FROM clause.
SELECT ...
FROM ...
PIVOT (
<aggregate function> [AS <alias>] [, <aggregate function> [AS <alias>]] ...
FOR (<column> [, <column>] ...)
IN (
(<value> [, <value>] ...) AS <new column>
[, (<value> [, <value>] ...) AS <new column>]
...
)
)
[...]
• <aggregate_function>
It indicates the aggregate function that needs to be calculated during the row-to-column conversion. The outer layer of an aggregate function cannot be nested in another function. It can be an expression that consists of a scalar function and a column. At the same time, the parameters of the aggregate function cannot contain other aggregate functions or window functions, and the columns of the aggregate function can only be columns in the upstream table.
• <alias>
It indicates the alias of the corresponding column of the aggregate function to be calculated during the row-to-column conversion.
• <column>
It indicates the column name of the corresponding row during the row-to-column conversion. It cannot be an expression.
• <value>
It indicates the value of the corresponding row during the row-to-column conversion. It can be an expression but does not contain any aggregate function or window function, and the number of elements in each tuple must be consistent with the number.
• <new_column>
It indicates the alias of the new column after the row-to-column conversion. If no alias is specified, an attempt is made to speculate the alias. If the speculation fails, the system will automatically generate an alias.
For more information about the syntax, see the documentation.
The PIVOT syntax can be equivalent to the combination of GROUP BY, aggregate function, and filter. Take the following syntax as an example:
SELECT ...
FROM ...
PIVOT (
agg1 AS a, agg2 AS b, ...
FOR (axis1, ..., axisN)
IN (
(v11, ..., v1N) AS label1,
(v21, ..., v2N) AS label2,
...)
)
The syntax above is equivalent to:
SELECT
k1, ... kN,
agg1 AS label1_a FILTER (where axis1 = v11 and ... and axisN = v1N),
agg2 AS label1_b FILTER (where axis1 = v21 and ... and axisN = v2N),
...,
agg1 AS label2_a FILTER (where axis1 = v11 and ... and axisN = v1N),
agg2 AS label2_b FILTER (where axis1 = v21 and ... and axisN = v2N),
...,
FROM xxxxxx
GROUP BY k1, ... kN
Among them, the table in FROM is the result of the upstream PIVOT. k1, ... kN is the set of all columns that do not appear in agg1, agg2, ..., axis1, ..., and axisN.
• Data preparation.
The following table represents the sales of the items of several chain shops in 2020 and 2021:
create table shops_table as select * from (select * from values
('pen', 10, 500, 'shop1', 2020),
('pen', 11, 500, 'shop2', 2020),
('pen', 9, 300, 'shop3', 2020),
('pen', 12, 400,'shop4', 2020),
('pen', 15, 200, 'shop1', 2021),
('pen', 16, 300, 'shop2', 2021),
('pen', 16, 400, 'shop3', 2021),
('pen', 15, 300, 'shop4', 2021),
('ruler', 20, 700, 'shop1', 2020),
('ruler', 19, 900, 'shop2', 2020),
('ruler', 22, 800, 'shop3', 2020),
('ruler', 19, 700, 'shop4', 2020),
('ruler', 25, 300, 'shop1', 2021),
('ruler', 20, 500, 'shop2', 2021),
('ruler', 23, 500, 'shop3', 2021),
('ruler', 26, 600, 'shop4', 2021)
shops(item_name, count, sales, shop_name, year))
;
select * from shops_table;
-- The results are as follows:
+-----------+------------+------------+-----------+------------+
| item_name | count | sales | shop_name | year |
+-----------+------------+------------+-----------+------------+
| pen | 10 | 500 | shop1 | 2020 |
| pen | 11 | 500 | shop2 | 2020 |
| pen | 9 | 300 | shop3 | 2020 |
| pen | 12 | 400 | shop4 | 2020 |
| pen | 15 | 200 | shop1 | 2021 |
| pen | 16 | 300 | shop2 | 2021 |
| pen | 16 | 400 | shop3 | 2021 |
| pen | 15 | 300 | shop4 | 2021 |
| ruler | 20 | 700 | shop1 | 2020 |
| ruler | 19 | 900 | shop2 | 2020 |
| ruler | 22 | 800 | shop3 | 2020 |
| ruler | 19 | 700 | shop4 | 2020 |
| ruler | 25 | 300 | shop1 | 2021 |
| ruler | 20 | 500 | shop2 | 2021 |
| ruler | 23 | 500 | shop3 | 2021 |
| ruler | 26 | 600 | shop4 | 2021 |
+-----------+------------+------------+-----------+------------+
• Count the number of items sold by each shop in each year.
• Before the PVIOT syntax is supported, the implementation is as follows:
SELECT item_name
,year
,SUM(CASE shop_name WHEN 'shop1' THEN count END) AS shop1
,SUM(CASE shop_name WHEN 'shop2' THEN count END) AS shop2
,SUM(CASE shop_name WHEN 'shop3' THEN count END) AS shop3
,SUM(CASE shop_name WHEN 'shop4' THEN count END) AS shop4
FROM shops_table
GROUP BY item_name
,year
;
-- The results are as follows:
+-----------+------------+------------+------------+------------+------------+
| item_name | year | 'shop1' | 'shop2' | 'shop3' | 'shop4' |
+-----------+------------+------------+------------+------------+------------+
| pen | 2020 | 10 | 11 | 9 | 12 |
| pen | 2021 | 15 | 16 | 16 | 15 |
| ruler | 2020 | 20 | 19 | 22 | 19 |
| ruler | 2021 | 25 | 20 | 23 | 26 |
+-----------+------------+------------+------------+------------+------------+
• With the PVIOT syntax, the implementation is as follows:
select * from (select item_name, year,count,shop_name from shops_table)
pivot (sum(count) for shop_name in ('shop1', 'shop2', 'shop3', 'shop4'));
-- The results are as follows:
+------------+------------+------------+------------+------------+------------+
| item_name | year | 'shop1' | 'shop2' | 'shop3' | 'shop4' |
+------------+------------+------------+------------+------------+------------+
| pen | 2020 | 10 | 11 | 9 | 12 |
| pen | 2021 | 15 | 16 | 16 | 15 |
| ruler | 2020 | 20 | 19 | 22 | 19 |
| ruler | 2021 | 25 | 20 | 23 | 26 |
+------------+------------+------------+------------+------------+------------+
• You can give the aggregate function and the new column new aliases at this point, and the column names are merged with underscores:
select * from (select item_name, count, shop_name, year from shops_table)
pivot (sum(count) as sum_count for shop_name in ('shop1' as shop_name_1, 'shop2' as shop_name_2, 'shop3' as shop_name_3, 'shop4' as shop_name_4));
-- The results are as follows:
+------------+------------+-----------------------+-----------------------+-----------------------+-----------------------+
| item_name | year | shop_name_1_sum_count | shop_name_2_sum_count | shop_name_3_sum_count | shop_name_4_sum_count |
+------------+------------+-----------------------+-----------------------+-----------------------+-----------------------+
| pen | 2020 | 10 | 11 | 9 | 12 |
| pen | 2021 | 15 | 16 | 16 | 15 |
| ruler | 2020 | 20 | 19 | 22 | 19 |
| ruler | 2021 | 25 | 20 | 23 | 26 |
+------------+------------+-----------------------+-----------------------+-----------------------+-----------------------+
• Calculate the total number of sales and maximum sales per shop per year for each item. With the PIVOT syntax, the implementation is as follows:
select * from shops_table
pivot (sum(count) as sum_count, max(sales) as max_sales for shop_name in ('shop1' as shop_name_1, 'shop2' as shop_name_2, 'shop3' as shop_name_3, 'shop4' as shop_name_4));
-- The results are as follows:
+-----------+------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
| item_name | year | shop_name_1_sum_count | shop_name_2_sum_count | shop_name_3_sum_count | shop_name_4_sum_count | shop_name_1_max_sales | shop_name_2_max_sales | shop_name_3_max_sales | shop_name_4_max_sales |
+-----------+------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
| pen | 2020 | 10 | 11 | 9 | 12 | 500 | 500 | 300 | 400 |
| pen | 2021 | 15 | 16 | 16 | 15 | 200 | 300 | 400 | 300 |
| ruler | 2020 | 20 | 19 | 22 | 19 | 700 | 900 | 800 | 700 |
| ruler | 2021 | 25 | 20 | 23 | 26 | 300 | 500 | 500 | 600 |
+-----------+------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
• Only calculate the total number of sales and maximum sales of shop1 in 2020 and 2021 for each item. With the PIVOT syntax, the implementation is as follows:
select * from shops_table
pivot (sum(count) as sum_count, max(sales) as max_sales for (shop_name, year) in (('shop1', 2021) as shop1_2021, ('shop1', 2020) as shop1_2020));
-- The results are as follows:
+-----------+----------------------+----------------------+----------------------+----------------------+
| item_name | shop1_2021_sum_count | shop1_2020_sum_count | shop1_2021_max_sales | shop1_2020_max_sales |
+-----------+----------------------+----------------------+----------------------+----------------------+
| pen | 15 | 10 | 200 | 500 |
| ruler | 25 | 20 | 300 | 700 |
+-----------+----------------------+----------------------+----------------------+----------------------+
The UNPIVOT syntax rotates a table by converting columns to rows. The UNPIVOT syntax is part of the FROM clause.
SELECT ...
FROM ...
UNPIVOT [EXCLUDE NULLS](
<new_column_of_name> [, <new_column_of_name>] ...
FOR (<new_column_of_value> [, <new_column_of_value>] ...)
IN (
(<column> [, <column>] ...) AS (<column_value> [, <column_value>] ...)
[, (<column> [, <column>] ...) AS (<column_value> [, <column_value>] ...)]
...
)
)
[...]
• [EXCLUDE NULLS]
It means that if this syntax is specified, all rows that are null are filtered out.
• <new_column_of_name>
It indicates the column that is used to store the original column name after the column-to-row conversion. The column name cannot be an expression or be repeated. The number must be the same as the number of elements in each tuple. If you do not specify the tuple, MaxCompute will automatically generate a set of tuples of the string type.
• <new_column_of_value>
It indicates the column that is used to store the corresponding value of the original column after the column-to-row conversion. The column name cannot be an expression or be repeated. The number must be the same as the number of elements in each tuple.
• <column>
It specifies the name of the column in the source table that you want to perform the column-to-row conversion.
• <column_value>
It indicates the alias of the original column used for column-to-row conversion. It can be used to replace the original column name. Column names are not allowed.
For more information about the syntax, see the documentation. The UNPIVOT syntax can be equivalent to a combination of the CROSS JOIN and CASE WHEN expressions. Take the following syntax as an example:
SELECT ...
FROM ...
UNPIVOT [exclude nulls](
(measure1, ..., measureM)
FOR (axis1, ..., axisN)
IN ((c11, ..., c1M) AS (value11, ..., value1N),
(c21, ..., c2M) AS (value21, ..., value2N), ...))
[...]
The syntax above is equivalent to:
SELECT * FROM
(
SELECT
k1, ... kN,
CASE
WHEN axis1 = value11 AND ... AND axisN = value1N THEN c11
WHEN axis1 = value21 AND ... AND axisN = value2N THEN c21
...
ELSE null AS measure1,
...,
CASE
WHEN axis1 = value11 AND ... AND axisN = value1N THEN c1M
WHEN axis1 = value21 AND ... AND axisN = value2N THEN c2M
ELSE null AS measureM,
axis1, ..., axisN
FROM xxxx
JOIN (VALUES (value11, ..., value1N),(value21, ..., value2N), ... AS generated_table_name(axis1, ..., axisN))
)
[WHERE measure1 is not null OR ... OR measureM is not null]
• Data preparation.
The following table represents the sales of the items of several chain shops in 2020 and 2021:
create table shops as select * from (select * from values
('pen', 2020, 100, 200, 300, 400),
('pen', 2021, 100, 200, 200, 100),
('ruler', 2020, 300, 400, 300, 200),
('ruler', 2021, 400, 300, 100, 100)
shops(item_name, year, shop1, shop2, shop3, shop4))
;
SELECT * from shops;
-- Execution result
+-----------+------------+------------+------------+------------+------------+
| item_name | year | shop1 | shop2 | shop3 | shop4 |
+-----------+------------+------------+------------+------------+------------+
| pen | 2020 | 100 | 200 | 300 | 400 |
| pen | 2021 | 100 | 200 | 200 | 100 |
| ruler | 2020 | 300 | 400 | 300 | 200 |
| ruler | 2021 | 400 | 300 | 100 | 100 |
+-----------+------------+------------+------------+------------+------------+
• Rotate the table to get the number of sales for each shop and replace it with the new column name, count.
• Before the UNPVIOT syntax is supported, the implementation is as follows:
select * from(
select item_name,year, 'shop1' as shop_name, shop1 as count from shops
union ALL
select item_name,year, 'shop2' as shop_name, shop2 as count from shops
UNION ALL
select item_name,year, 'shop3' as shop_name, shop3 as count from shops
UNION ALL
select item_name,year, 'shop4' as shop_name, shop4 as count from shops
);
-- Execution result
+------------+------------+------------+------------+
| item_name | year | shop_name | count |
+------------+------------+------------+------------+
| pen | 2020 | shop1 | 100 |
| pen | 2021 | shop1 | 100 |
| ruler | 2020 | shop1 | 300 |
| ruler | 2021 | shop1 | 400 |
| pen | 2020 | shop2 | 200 |
| pen | 2021 | shop2 | 200 |
| ruler | 2020 | shop2 | 400 |
| ruler | 2021 | shop2 | 300 |
| pen | 2020 | shop3 | 300 |
| pen | 2021 | shop3 | 200 |
| ruler | 2020 | shop3 | 300 |
| ruler | 2021 | shop3 | 100 |
| pen | 2020 | shop4 | 400 |
| pen | 2021 | shop4 | 100 |
| ruler | 2020 | shop4 | 200 |
| ruler | 2021 | shop4 | 100 |
+------------+------------+------------+------------+
• With the UNPIVOT syntax, the implementation is as follows:
select * from shops
unpivot (count for shop_name in (shop1, shop2, shop3, shop4));
-- Execution result
+------------+------------+------------+------------+
| item_name | year | count | shop_name |
+------------+------------+------------+------------+
| pen | 2020 | 100 | shop1 |
| pen | 2020 | 200 | shop2 |
| pen | 2020 | 300 | shop3 |
| pen | 2020 | 400 | shop4 |
| pen | 2021 | 100 | shop1 |
| pen | 2021 | 200 | shop2 |
| pen | 2021 | 200 | shop3 |
| pen | 2021 | 100 | shop4 |
| ruler | 2020 | 300 | shop1 |
| ruler | 2020 | 400 | shop2 |
| ruler | 2020 | 300 | shop3 |
| ruler | 2020 | 200 | shop4 |
| ruler | 2021 | 400 | shop1 |
| ruler | 2021 | 300 | shop2 |
| ruler | 2021 | 100 | shop3 |
| ruler | 2021 | 100 | shop4 |
+------------+------------+------------+------------+
• If shop1 and shop2 are east shops and shop3 and shop4 are west shops, a new column is needed to represent east shops and west shops. The count1 and count2 columns respectively store the number of sales of the two shops.
select * from shops
unpivot ((count1, count2) for shop_name in ((shop1, shop2) as 'east_shop', (shop3, shop4) as 'west_shop'))
;
-- Execution result
+------------+------------+------------+------------+------------+
| item_name | year | count1 | count2 | shop_name |
+------------+------------+------------+------------+------------+
| pen | 2020 | 100 | 200 | east_shop |
| pen | 2020 | 300 | 400 | west_shop |
| pen | 2021 | 100 | 200 | east_shop |
| pen | 2021 | 200 | 100 | west_shop |
| ruler | 2020 | 300 | 400 | east_shop |
| ruler | 2020 | 300 | 200 | west_shop |
| ruler | 2021 | 400 | 300 | east_shop |
| ruler | 2021 | 100 | 100 | west_shop |
+------------+------------+------------+------------+------------+
• The alias can be multiple columns, but the corresponding new column names to be generated must be added accordingly:
select * from shops
unpivot ((count1, count2) for (shop_name, location) in ((shop1, shop2) as ('east_shop', 'east'), (shop3, shop4) as ('west_shop', 'west')));
-- Execution result
+------------+------------+------------+------------+------------+------------+
| item_name | year | count1 | count2 | shop_name | location |
+------------+------------+------------+------------+------------+------------+
| pen | 2020 | 100 | 200 | east_shop | east |
| pen | 2020 | 300 | 400 | west_shop | west |
| pen | 2021 | 100 | 200 | east_shop | east |
| pen | 2021 | 200 | 100 | west_shop | west |
| ruler | 2020 | 300 | 400 | east_shop | east |
| ruler | 2020 | 300 | 200 | west_shop | west |
| ruler | 2021 | 400 | 300 | east_shop | east |
| ruler | 2021 | 100 | 100 | west_shop | west |
+------------+------------+------------+------------+------------+------------+
The PIVOT and UNPIVOT syntax fulfills the requirements of converting rows to columns and columns to rows in a more concise and user-friendly manner. It simplifies query statements and enhances the productivity of big data developers.
137 posts | 19 followers
FollowAlibaba Cloud MaxCompute - February 7, 2024
Alibaba Cloud MaxCompute - January 22, 2024
Alibaba Cloud MaxCompute - January 22, 2024
Alibaba Cloud MaxCompute - January 22, 2024
Alibaba Cloud MaxCompute - February 7, 2024
Alibaba Cloud MaxCompute - January 22, 2024
137 posts | 19 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.
Learn MoreMore Posts by Alibaba Cloud MaxCompute