MaxCompute支援PIVOT和UNPIVOT關鍵字,通過PIVOT關鍵字可以基於彙總將一個或者多個指定值的行轉換為列;通過UNPIVOT關鍵字可以將一個或者多個列轉換為行。本文為您介紹PIVOT和UNPIVOT關鍵字的使用和樣本。
PIVOT關鍵字
PIVOT關鍵字對於指定的每一組行值,都會產生對應的列。PIVOT關鍵字是FROM子句的一部分,可以和JOIN等其他關鍵字一同使用。
當前PIVOT關鍵字處於灰階發布中,部分使用者可能無法使用。
命令格式
SELECT ...
FROM ...
PIVOT (
<aggregate function> [AS <alias>] [, <aggregate function> [AS <alias>]] ...
FOR (<column> [, <column>] ...)
IN (
(<value> [, <value>] ...) AS <new column>
[, (<value> [, <value>] ...) AS <new column>]
...
)
)
[...]
參數說明:
參數 | 是否必選 | 說明 |
aggregate function | 是 | 彙總函式,可使用的彙總函式請參見彙總函式。 |
alias | 否 | 彙總函式的別名,別名和最終PIVOT處理過後產生的列名相關,詳情請參見使用限制。 |
column | 是 | 指定轉換為列的行值在源表中的列名稱。 |
value | 是 | 指定轉換為列的行值。 |
new column | 否 | 轉換後新的列名稱。 |
使用限制
彙總函式相關:
彙總函式的外層不能嵌套任何函數。
彙總函式參數可以是Scalar函數和列組成的運算式。
彙總函式的參數內部不能有其他彙總函式、Window函數。
彙總函式的列只能是上遊表中的列。
alias
只能是列名稱,不能是任何錶達式。value
內部可以是運算式,運算式中的列只能是上遊表中的列,允許包含Scalar函數,但是不允許有任何彙總函式和Window函數。PIVOT內部的所有的別名不可忽略,別名和最終PIVOT處理過後的列名息息相關,存在如下幾種情況:
PIVOT (agg1 for axis1 in ('1', '2', '3', ...))
:如果
value
是常量而不是運算式且彙總函式未指定別名,那麼PIVOT處理後列名為value
,即'1','2','3'...
。PIVOT (agg1 as a for axis1 in ('1', '2', '3', ...))
:如果
value
是常量而不是運算式且彙總函式指定了別名,那麼PIVOT處理後列名為value_彙總函式別名
,即'1'_a, '2'_a, ...
。PIVOT (agg1 as a, agg2 as b for axis1 in ('1', '2', '3', ...))
:如果
value
是常量而不是運算式且彙總函式指定了別名,那麼PIVOT處理後列名為value_彙總函式別名
,即'1'_a, '2'_a, ..., '1'_b, '2'_b, ...
。PIVOT (agg1 as a, agg2 for axis1 in (expr1, expr2, '3', ...))
:如果有
value
是運算式,首先MaxCompute會為運算式(expr1
和expr2
)和未指定別名的彙總函式agg2
產生別名,即PIVOT (agg1 as a, agg2 as generated_alias1 for axis1 in (expr1 as generated_alias2, expr2 as generated_alias3, '3', ...))
。那麼PIVOT處理後列名為value/運算式別名_彙總函式別名
,即generated_alias2_a, generated_alias3_a,'3'_a ..., generated_alias2_generated_alias1, generated_alias3_generated_alias1,'3'_ generated_alias1...
。
使用說明
PIVOT文法可以等效為Group by、Aggregate function和Filter的結合,樣本如下。
SELECT ...
FROM ...
PIVOT (
agg1 AS a, agg2 AS b, ...
FOR (axis1, ..., axisN)
IN (
(v11, ..., v1N) AS label1,
(v21, ..., v2N) AS label2,
...)
)
等效於如下語句:
select
k1, ... kN,
agg1 AS label1_a filter (where axis1 = v11 and ... and axisN = v1N),
agg2 AS label1_b filter (where axis1 = v21 and ... and axisN = v2N),
...,
agg1 AS label2_a filter (where axis1 = v11 and ... and axisN = v1N),
agg2 AS label2_b filter (where axis1 = v21 and ... and axisN = v2N),
...,
from xxxxxx
group by k1, ... kN
其中from
語句內的表是PIVOT上遊的結果,k1, ... kN
是所有未在agg1, agg2, ...
和axis1, ..., axisN
出現的列的集合。
使用樣本
樣本資料為某公司今年各個季節的水果銷售流水,建表DDL如下。
--建立表
create table mf_cop_sales (tran_id bigint,
productID string,
tran_amt decimal,
season string);
insert into table mf_cop_sales values(1,'apple',100,'Q1'),
(2,'orange',200,'Q1'),
(3,'banana',300,'Q1'),
(4,'apple',400,'Q2'),
(5,'orange',500,'Q2'),
(6,'banana',600,'Q2'),
(7,'apple',700,'Q3'),
(8,'orange',800,'Q3'),
(9,'banana',700,'Q3'),
(10,'apple',500,'Q4'),
(11,'orange',400,'Q4'),
(12,'banana',200,'Q4');
--銷售表詳細資料如下
select * from mf_cop_sales;
+------------+------------+------------+------------+
| tran_id | productid | tran_amt | season |
+------------+------------+------------+------------+
| 1 | apple | 100 | Q1 |
| 2 | orange | 200 | Q1 |
| 3 | banana | 300 | Q1 |
| 4 | apple | 400 | Q2 |
| 5 | orange | 500 | Q2 |
| 6 | banana | 600 | Q2 |
| 7 | apple | 700 | Q3 |
| 8 | orange | 800 | Q3 |
| 9 | banana | 700 | Q3 |
| 10 | apple | 500 | Q4 |
| 11 | orange | 400 | Q4 |
| 12 | banana | 200 | Q4 |
+------------+------------+------------+------------+
查詢本年度每個季節的銷售情況。
select * from (select season, tran_amt from mf_cop_sales) pivot (sum(tran_amt) for season in ('Q1' as spring, 'Q2' as summer, 'Q3' as autumn, 'Q4' as winter) ); --返回結果 +--------+--------+--------+--------+ | spring | summer | autumn | winter | +--------+--------+--------+--------+ | 600 | 1500 | 2200 | 1100 | +--------+--------+--------+--------+
查詢本年度各個產品的銷售情況。
select * from (select productid, tran_amt from mf_cop_sales) pivot (sum(tran_amt) as sumbypro for productid in ('apple', 'orange', 'banana')); --返回結果 +------------------+-------------------+-------------------+ | 'apple'_sumbypro | 'orange'_sumbypro | 'banana'_sumbypro | +------------------+-------------------+-------------------+ | 1700 | 1900 | 1800 | +------------------+-------------------+-------------------+
查詢本年度第四季度銷售額最高的產品。
select * from (select season, tran_amt from mf_cop_sales) pivot (max(tran_amt) for season in ('Q4')); --返回結果 +------+ | 'Q4' | +------+ | 500 | +------+
UNPIVOT關鍵字
UNPIVOT關鍵字對於指定的每一組列,都會產生對應的行。其中UNPIVOT關鍵字是FROM子句的一部分,可以和JOIN關鍵字等其他關鍵字一同使用。
命令格式
SELECT ...
FROM ...
UNPIVOT (
<new column of value> [, <new column of value>] ...
FOR (<new column of name> [, <new column of name>] ...)
IN (
(<column> [, <column>] ...) [AS (<column value> [, <column value>] ...)]
[, (<column> [, <column>] ...) [AS (<column value> [, <column value>] ...)]]
...
)
)
[...]
參數說明如下:
參數 | 是否必選 | 說明 |
new column of value | 是 | 轉換後新產生的列名稱,該列的值由指定轉換為行的列的值填充。 |
new column of name | 是 | 轉換後新產生的列名稱,該列的值由指定轉換為行的列名稱填充。 |
column | 是 | 指定轉換為行的列名稱,列的名稱用來填充new column of name;列的值用來填充new column of value。 |
column value | 否 | 指定轉換為行的列的別名 |
使用限制
一個新產生的列(new column of value)需要對應一組指定轉換為行的列,即一個
new column of value
對應一組(<column1> [, <column2>] ...)
,所以new column of value
的個數需要和指定列的組數相同,即new column of value1, ..., new column of valueM
對應:(column11, ..., column1N) AS (column value11, ..., column value1N), (column21, ..., column2N) AS (column value21, ..., column value2N), ... (columnM1, ..., columnMN) AS (column valueM1, ..., column valueMN)
一個新產生的列(new column of name)需要對應一組指定列的別名,即一個
new column of name
對應一組(<column value1> [, <column value2>] ...)
,所以new column of name
的個數需要和指定列別名的組數相同,即new column of name1, ..., new column of nameM
對應:(column value11, ..., column value1N), (column value21, ..., column value2N), ... (column valueM1, ..., column valueMN)
說明實際使用中可以省略
(<column value> [, <column value>] ...)
,MaxCompute會為指定列自動產生別名。如果您需要自訂指定別名,一定要保證數量的一一對應。new column of value
和new column of name
的集合必須是列名稱,不能包含任何的運算式,並且new column of value
和new column of name
的集合不能重名,因為new column of value
和new column of name
集合內部的元素都是需要輸出的列。column只能是上遊表的列名。
column value可以為常數或運算式。如果是運算式,就不允許有任何列,保證可以常量摺疊成常數。
(<column1> [, <column2>] ...)
的組數不能超過100,否則會造成資料過度膨脹。如果省略指定列的別名(
(<column value> [, <column value>] ...)
,MaxCompute自動產生一組String的值來代替,規則如下:UNPIVOT (measure1 for axis in (c1, c2, c3, ...))
:此情況下產生的值是(c1, c2, c3, ...)
,即原文法改寫為:UNPIVOT (measure1 for axis in (c1 as c1, c2 as c2, c3 as c3, ...))
。除上述情況以外的所有未指定別名的情況,全部都由MaxCompute自動產生指定列的別名。
若是有些列省略,有些列設定了別名,需要保證設定的別名是String類型、可以相容自動產生的String類型的別名。若是非String類型,需要設定所有的別名,不能省略。
使用說明
UNPIVOT文法可以等效為Cross Join和Filter(case when expression)的結合,樣本如下。
SELECT ...
FROM ...
UNPIVOT (
(measure1, ..., measureM)
FOR (axis1, ..., axisN)
IN ((c11, ..., c1M) AS (value11, ..., value1N),
(c21, ..., c2M) AS (value21, ..., value2N), ...))
[...]
等效於如下語句:
select
k1, ... kN,
case
when axis1 = value11 and ... and axisN = value1N then c11
when axis1 = value21 and ... and axisN = value2N then c21
...
else null as measure1,
...,
case
when axis1 = value11 and ... and axisN = value1N then c1M
when axis1 = value21 and ... and axisN = value2N then c2M
else null as measureM,
axis1, ..., axisN
from xxxx
join (values (value11, ..., value1N),(value21, ..., value2N), ... as generated_table_name(axis1, ..., axisN))
使用樣本
樣本資料為各個商店對應物品在對應年份的銷售情況,建表DDL如下。
--建立表
create table mf_shops(item_id bigint,
year string,
shop1 decimal,
shop2 decimal,
shop3 decimal,
shop4 decimal);
--插入資料
with shops_table as
(select * from values(1, 2020, 100, 200, 300, 400),
(1, 2021, 100, 200, 200, 100),
(2, 2020, 300, 400, 300, 200),
(2, 2021, 400, 300, 100, 100)
shops(item_id, year, shop1, shop2, shop3, shop4)
)
insert overwrite table mf_shops
select * from shops_table;
--查詢資料
select * from mf_shops;
--返回結果
+------------+------+-------+-------+-------+-------+
| item_id | year | shop1 | shop2 | shop3 | shop4 |
+------------+------+-------+-------+-------+-------+
| 1 | 2020 | 100 | 200 | 300 | 400 |
| 1 | 2021 | 100 | 200 | 200 | 100 |
| 2 | 2020 | 300 | 400 | 300 | 200 |
| 2 | 2021 | 400 | 300 | 100 | 100 |
+------------+------+-------+-------+-------+-------+
合并各個商店的銷售額,並且用新的列名
sales
來展示銷售額。--合并各個商店的銷售額 select * from mf_shops unpivot (sales for shop in (shop1, shop2, shop3, shop4)); --返回結果 +------------+------------+------------+------+ | item_id | year | sales | shop | +------------+------------+------------+------+ | 1 | 2020 | 100 | shop1 | | 1 | 2020 | 200 | shop2 | | 1 | 2020 | 300 | shop3 | | 1 | 2020 | 400 | shop4 | | 1 | 2021 | 100 | shop1 | | 1 | 2021 | 200 | shop2 | | 1 | 2021 | 200 | shop3 | | 1 | 2021 | 100 | shop4 | | 2 | 2020 | 300 | shop1 | | 2 | 2020 | 400 | shop2 | | 2 | 2020 | 300 | shop3 | | 2 | 2020 | 200 | shop4 | | 2 | 2021 | 400 | shop1 | | 2 | 2021 | 300 | shop2 | | 2 | 2021 | 100 | shop3 | | 2 | 2021 | 100 | shop4 | +------------+------------+------------+------+
可以為每一個商店名字起一個別名,別名可以是表內的值,也可以是字串:
select * from mf_shops unpivot (sales for shop in (shop1 as 'shop_name_1', shop2 as 'shop_name_2', shop3 as 'shop_name_3', shop4 as 'shop_name_4')); --返回結果 +------------+------------+------------+------+ | item_id | year | sales | shop | +------------+------------+------------+------+ | 1 | 2020 | 100 | shop_name_1 | | 1 | 2020 | 200 | shop_name_2 | | 1 | 2020 | 300 | shop_name_3 | | 1 | 2020 | 400 | shop_name_4 | | 1 | 2021 | 100 | shop_name_1 | | 1 | 2021 | 200 | shop_name_2 | | 1 | 2021 | 200 | shop_name_3 | | 1 | 2021 | 100 | shop_name_4 | | 2 | 2020 | 300 | shop_name_1 | | 2 | 2020 | 400 | shop_name_2 | | 2 | 2020 | 300 | shop_name_3 | | 2 | 2020 | 200 | shop_name_4 | | 2 | 2021 | 400 | shop_name_1 | | 2 | 2021 | 300 | shop_name_2 | | 2 | 2021 | 100 | shop_name_3 | | 2 | 2021 | 100 | shop_name_4 | +------------+------------+------------+------+
如果shop1和shop2是東區商店,shop3和shop4是西區商店,一個新的表展示東區商店和西區商店。其中sales1和sales2兩列分別儲存了兩店的銷售額。
select * from mf_shops unpivot ((sales1, sales2) for shop in ((shop1, shop2) as 'east_shop', (shop3, shop4) as 'west_shop')); --返回結果 +------------+------------+------------+------------+------+ | item_id | year | sales1 | sales2 | shop | +------------+------------+------------+------------+------+ | 1 | 2020 | 100 | 200 | east_shop | | 1 | 2020 | 300 | 400 | west_shop | | 1 | 2021 | 100 | 200 | east_shop | | 1 | 2021 | 200 | 100 | west_shop | | 2 | 2020 | 300 | 400 | east_shop | | 2 | 2020 | 300 | 200 | west_shop | | 2 | 2021 | 400 | 300 | east_shop | | 2 | 2021 | 100 | 100 | west_shop | +------------+------------+------------+------------+------+
別名可以是多列,對應的列數也要相應增加。
select * from mf_shops unpivot ((sales1, sales2) for (shop_name, location) in ((shop1, shop2) as ('east_shop', 'east'), (shop3, shop4) as ('west_shop', 'west'))); +------------+------------+------------+------------+-----------+----------+ | item_id | year | sales1 | sales2 | shop_name | location | +------------+------------+------------+------------+-----------+----------+ | 1 | 2020 | 100 | 200 | east_shop | east | | 1 | 2020 | 300 | 400 | west_shop | west | | 1 | 2021 | 100 | 200 | east_shop | east | | 1 | 2021 | 200 | 100 | west_shop | west | | 2 | 2020 | 300 | 400 | east_shop | east | | 2 | 2020 | 300 | 200 | west_shop | west | | 2 | 2021 | 400 | 300 | east_shop | east | | 2 | 2021 | 100 | 100 | west_shop | west | +------------+------------+------------+------------+-----------+----------+
對於sales1和sales2為null的無意義行,可以通過exclude nulls過濾。
with shops as (select * from values (1, 2020, 100, 200, 300, 400), (1, 2021, 100, 200, 200, 100), (2, 2020, 300, 400, 300, 200), (2, 2021, 400, 300, 100, 100), (3, 2020, null, null, null, null) shops(item_id, year, shop1, shop2, shop3, shop4)) select * from shops unpivot exclude nulls ((sales1, sales2) for (shop_name, location) in ((shop1, shop2) as ('east_shop', 'east'), (shop3, shop4) as ('west_shop', 'west'))); --返回結果 +------------+------------+------------+------------+-----------+----------+ | item_id | year | sales1 | sales2 | shop_name | location | +------------+------------+------------+------------+-----------+----------+ | 1 | 2020 | 100 | 200 | east_shop | east | | 1 | 2020 | 300 | 400 | west_shop | west | | 1 | 2021 | 100 | 200 | east_shop | east | | 1 | 2021 | 200 | 100 | west_shop | west | | 2 | 2020 | 300 | 400 | east_shop | east | | 2 | 2020 | 300 | 200 | west_shop | west | | 2 | 2021 | 400 | 300 | east_shop | east | | 2 | 2021 | 100 | 100 | west_shop | west | +------------+------------+------------+------------+-----------+----------+