In some cases, you need to execute a UNION ALL
clause multiple times to aggregate and analyze data from multiple dimensions. For
example, if you want to aggregate Column a, aggregate Column b, and then aggregate
Column a and Column b, you can use GROUPING SETS
to perform these operations. This topic describes how to use GROUPING SETS
for multidimensional aggregation.
Description
GROUPING SETS
is an extension of a GROUP BY
clause in a SELECT
statement. The GROUPING SETS clause allows you to group your results in multiple
ways, without the need to execute multiple SELECT
statements and UNION ALL
in sequence. This allows MaxCompute to generate more efficient execution plans with
higher performance.
GROUPING SETS
.
Type | Description |
---|---|
CUBE |
A special form of GROUPING SETS , CUBE lists all the possible combinations of specified columns as grouping sets . You can also use CUBE with GROUPING SETS .
|
ROLLUP |
A special form of GROUPING SETS . ROLLUP aggregates specified columns and generates grouping sets in a hierarchical manner. You can also use ROLLUP with GROUPING SETS .
|
GROUPING |
NULL is used as placeholders in the results of GROUPING SETS . As a result, NULL placeholders cannot be distinguished from NULL values. To address
this issue, MaxCompute provides GROUPING . GROUPING allows you to use the name of a column as a parameter. If specific rows are aggregated
based on a column, 0 is returned. In this case, NULL is used as a value. If specific
rows are not aggregated based on a column, 1 is returned. In this case, NULL is used
as a placeholder.
|
GROUPING_ID |
GROUPING_ID allows you to use the names of one or more columns as parameters. The
grouping results of columns are used to bitmap integer values.
|
GROUPING__ID |
GROUPING__ID does not require parameters. It is used for Hive-compatible queries. GROUPING__ID
is equivalent to GROUPING__ID(GROUP BY Parameter list) in MaxCompute. The parameters of GROUPING__ID are in the same order as GROUP BY .
Note If you use Hive 2.3.0 or later, we recommend that you use this function in MaxCompute.
If you use a Hive version earlier than 2.3.0, we recommend that you do not use this
function in MaxCompute.
|
Examples
Example for using GROUPING SETS
:
- Prepare data.
create table requests lifecycle 20 as select * from values (1, 'windows', 'PC', 'Beijing'), (2, 'windows', 'PC', 'Shijiazhuang'), (3, 'linux', 'Phone', 'Beijing'), (4, 'windows', 'PC', 'Beijing'), (5, 'ios', 'Phone', 'Shijiazhuang'), (6, 'linux', 'PC', 'Beijing'), (7, 'windows', 'Phone', 'Shijiazhuang') as t(id, os, device, city);
- Use one of the following methods to group data:
- Execute multiple
SELECT
statements to group data.select NULL, NULL, NULL, count(*) from requests union all select os, device, NULL, count(*) from requests group by os, device union all select null, null, city, count(*) from requests group by city;
- Use
GROUPING SETS
to group data.
The following result is returned:select os,device, city ,count(*) from requests group by grouping sets((os, device), (city), ());
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | NULL | NULL | Beijing | 4 | | NULL | NULL | Shijiazhuang | 3 | | ios | Phone | NULL | 1 | | linux | PC | NULL | 1 | | linux | Phone | NULL | 1 | | windows | PC | NULL | 3 | | windows | Phone | NULL | 1 | +------------+------------+------------+------------+
Note If some expressions are not used in GROUPING SETS, NULL is used as placeholders for these expressions, for example, NULL of the city column in the fourth row to the eighth row. This way, you can perform operations on the result sets. - Execute multiple
Examples for using CUBE or ROLLUP
CUBE
or ROLLUP
based on the syntax of GROUPING SETS
:
- Example 1: Use
CUBE
to list all the possible columns os, device, and city asgrouping sets
. Sample statement:
The following result is returned:select os,device, city, count(*) from requests group by cube (os, device, city); -- The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os, city),(device,city),(os),(device),(city),());
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | NULL | NULL | Beijing | 4 | | NULL | NULL | Shijiazhuang | 3 | | NULL | PC | NULL | 4 | | NULL | PC | Beijing | 3 | | NULL | PC | Shijiazhuang | 1 | | NULL | Phone | NULL | 3 | | NULL | Phone | Beijing | 1 | | NULL | Phone | Shijiazhuang | 2 | | ios | NULL | NULL | 1 | | ios | NULL | Shijiazhuang | 1 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | NULL | 2 | | linux | NULL | Beijing | 2 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | NULL | 4 | | windows | NULL | Beijing | 2 | | windows | NULL | Shijiazhuang | 2 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
- Example 2: Use
CUBE
to list all the possible combinations of columns,(os, device),(device, city)
, asgrouping sets
. Sample statement:
The following result is returned:select os,device, city, count(*) from requests group by cube ((os, device), (device, city)); -- The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(device,city),());
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | NULL | PC | Beijing | 3 | | NULL | PC | Shijiazhuang | 1 | | NULL | Phone | Beijing | 1 | | NULL | Phone | Shijiazhuang | 2 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
- Example 3: Use
ROLLUP
to aggregate the os, device, and city columns in a hierarchical manner to generate multiplegrouping sets
. Sample statement:
The following result is returned:select os,device, city, count(*) from requests group by rollup (os, device, city); -- The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os),());
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | ios | NULL | NULL | 1 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | NULL | 2 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | NULL | 4 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
- Example 4: Use
ROLLUP
to aggregateos, (os,device), city
in a hierarchical manner to generate multiplegrouping sets
. Sample statement:
The following result is returned:select os,device, city, count(*) from requests group by rollup (os, (os,device), city); -- The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os),());
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | ios | NULL | NULL | 1 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | NULL | 2 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | NULL | 4 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
- Example 5: Use
GROUP BY
,CUBE
, andGROUPING SETS
to generate multiplegrouping sets
. Sample statement:
The following result is returned:select os,device, city, count(*) from requests group by os, cube(os,device), grouping sets(city); -- The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets((os,device,city),(os,city),(os,device,city));
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | ios | NULL | Shijiazhuang | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | Beijing | 2 | | linux | PC | Beijing | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | Beijing | 2 | | windows | NULL | Shijiazhuang | 2 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
Example for using GROUPING and GROUPING_ID
select a,b,c,count(*),
grouping(a) ga, grouping(b) gb, grouping(c) gc, grouping_id(a,b,c) groupingid
from values (1,2,3) as t(a,b,c)
group by cube(a,b,c);
The following result is returned:+------------+------------+------------+------------+------------+------------+------------+------------+
| a | b | c | _c3 | ga | gb | gc | groupingid |
+------------+------------+------------+------------+------------+------------+------------+------------+
| NULL | NULL | NULL | 1 | 1 | 1 | 1 | 7 |
| NULL | NULL | 3 | 1 | 1 | 1 | 0 | 6 |
| NULL | 2 | NULL | 1 | 1 | 0 | 1 | 5 |
| NULL | 2 | 3 | 1 | 1 | 0 | 0 | 4 |
| 1 | NULL | NULL | 1 | 0 | 1 | 1 | 3 |
| 1 | NULL | 3 | 1 | 0 | 1 | 0 | 2 |
| 1 | 2 | NULL | 1 | 0 | 0 | 1 | 1 |
| 1 | 2 | 3 | 1 | 0 | 0 | 0 | 0 |
+------------+------------+------------+------------+------------+------------+------------+------------+
GROUP BY
are filled with NULL. You can use GROUPING
to specify the values that you require. Sample statement that is based on the syntax
of GROUPING SETS
:select
if(grouping(os) == 0, os, 'ALL') as os,
if(grouping(device) == 0, device, 'ALL') as device,
if(grouping(city) == 0, city, 'ALL') as city,
count(*) as count
from requests
group by os, device, city grouping sets((os, device), (city), ());
The following result is returned:+------------+------------+------------+------------+
| os | device | city | count |
+------------+------------+------------+------------+
| ALL | ALL | ALL | 7 |
| ALL | ALL | Beijing | 4 |
| ALL | ALL | Shijiazhuang | 3 |
| ios | Phone | ALL | 1 |
| linux | PC | ALL | 1 |
| linux | Phone | ALL | 1 |
| windows | PC | ALL | 3 |
| windows | Phone | ALL | 1 |
+------------+------------+------------+------------+
Example for using GROUPING__ID:
GROUPING__ID
without parameters specified:select
a, b, c, count(*), grouping__id
from values (1,2,3) as t(a,b,c)
group by a, b, c grouping sets ((a,b,c), (a));
-- The preceding statement is equivalent to the following statement:
select
a, b, c, count(*), grouping_id(a,b,c)
from values (1,2,3) as t(a,b,c)
group by a, b, c grouping sets ((a,b,c), (a));
The following result is returned:+------------+------------+------------+------------+------------+
| a | b | c | _c3 | _c4 |
+------------+------------+------------+------------+------------+
| 1 | NULL | NULL | 1 | 3 |
| 1 | 2 | 3 | 1 | 0 |
+------------+------------+------------+------------+------------+