GROUPING SETS - MaxCompute - Alibaba Cloud Documentation Center

Description

GROUPING SETS is an extension of a GROUP BY clause in a SELECT statement. The GROUPING SETS clause allows you to group your results in multiple ways, without the need to execute multiple SELECT statements and UNION ALL in sequence. This allows MaxCompute to generate more efficient execution plans with higher performance.

The following table describes the syntax that is associated with GROUPING SETS.

Type	Description


Type	Description
`CUBE`	A special form of `GROUPING SETS`, CUBE lists all the possible combinations of specified columns as `grouping sets`. You can also use CUBE with `GROUPING SETS`. `group by cube (a, b, c) -- Equivalent to the following clauses: grouping sets ((a,b,c),(a,b),(a,c),(b,c),(a),(b),(c),()) group by cube ( (a, b), (c, d) ) -- Equivalent to the following clauses: grouping sets ( ( a, b, c, d ), ( a, b ), ( c, d ), ( ) ) group by a, cube (b, c), grouping sets ((d), (e)) -- Equivalent to the following clause: group by grouping sets ( (a, b, c, d), (a, b, c, e), (a, b, d), (a, b, e), (a, c, d), (a, c, e), (a, d), (a, e) )`
`ROLLUP`	A special form of `GROUPING SETS`. ROLLUP aggregates specified columns and generates `grouping sets` in a hierarchical manner. You can also use ROLLUP with `GROUPING SETS`. `group by rollup (a, b, c) -- Equivalent to the following clauses: grouping sets ((a,b,c),(a,b),(a), ()) group by rollup ( a, (b, c), d ) -- Equivalent to the following clauses: grouping sets ( ( a, b, c, d ), ( a, b, c ), ( a ), ( ) ) group by grouping sets((b), (c), rollup(a,b,c)) -- Equivalent to the following clause: group by grouping sets ( (b), (c), (a,b,c), (a,b), (a), () )`
`GROUPING`	NULL is used as placeholders in the results of `GROUPING SETS`. As a result, NULL placeholders cannot be distinguished from NULL values. To address this issue, MaxCompute provides `GROUPING`. `GROUPING` allows you to use the name of a column as a parameter. If specific rows are aggregated based on a column, 0 is returned. In this case, NULL is used as a value. If specific rows are not aggregated based on a column, 1 is returned. In this case, NULL is used as a placeholder.
`GROUPING_ID`	GROUPING_ID allows you to use the names of one or more columns as parameters. The `grouping` results of columns are used to bitmap integer values.
`GROUPING__ID`	`GROUPING__ID` does not require parameters. It is used for Hive-compatible queries. GROUPING__ID is equivalent to `GROUPING__ID(GROUP BY Parameter list)` in MaxCompute. The parameters of GROUPING__ID are in the same order as `GROUP BY`. Note If you use Hive 2.3.0 or later, we recommend that you use this function in MaxCompute. If you use a Hive version earlier than 2.3.0, we recommend that you do not use this function in MaxCompute.

Examples

Example for using GROUPING SETS:

Prepare data.

create table requests lifecycle 20 as 
select * from values 
    (1, 'windows', 'PC', 'Beijing'),
    (2, 'windows', 'PC', 'Shijiazhuang'),
    (3, 'linux', 'Phone', 'Beijing'),
    (4, 'windows', 'PC', 'Beijing'),
    (5, 'ios', 'Phone', 'Shijiazhuang'),
    (6, 'linux', 'PC', 'Beijing'),
    (7, 'windows', 'Phone', 'Shijiazhuang') 
as t(id, os, device, city);

Use one of the following methods to group data:

Execute multiple SELECT statements to group data.

select NULL, NULL, NULL, count(*)
from requests
union all
select os, device, NULL, count(*)
from requests group by os, device
union all
select null, null, city, count(*)
from requests group by city;

Use GROUPING SETS to group data.

select os,device, city ,count(*)
from requests
group by grouping sets((os, device), (city), ());

The following result is returned:

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| NULL       | NULL       | Beijing    | 4          |
| NULL       | NULL       | Shijiazhuang | 3          |
| ios        | Phone      | NULL       | 1          |
| linux      | PC         | NULL       | 1          |
| linux      | Phone      | NULL       | 1          |
| windows    | PC         | NULL       | 3          |
| windows    | Phone      | NULL       | 1          |
+------------+------------+------------+------------+

Note

If some expressions are not used in GROUPING SETS, NULL is used as placeholders for these expressions, for example, NULL of the city column in the fourth row to the eighth row. This way, you can perform operations on the result sets.

Examples for using CUBE or ROLLUP

Examples for using CUBE or ROLLUP based on the syntax of GROUPING SETS:

Example 1: Use CUBE to list all the possible columns os, device, and city as grouping sets. Sample statement:

select os,device, city, count(*)
from requests 
group by cube (os, device, city);
-- The preceding statement is equivalent to the following statement:
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os, city),(device,city),(os),(device),(city),());

The following result is returned:

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| NULL       | NULL       | Beijing    | 4          |
| NULL       | NULL       | Shijiazhuang | 3          |
| NULL       | PC         | NULL       | 4          |
| NULL       | PC         | Beijing    | 3          |
| NULL       | PC         | Shijiazhuang | 1          |
| NULL       | Phone      | NULL       | 3          |
| NULL       | Phone      | Beijing    | 1          |
| NULL       | Phone      | Shijiazhuang | 2          |
| ios        | NULL       | NULL       | 1          |
| ios        | NULL       | Shijiazhuang | 1          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | NULL       | 2          |
| linux      | NULL       | Beijing    | 2          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | NULL       | 4          |
| windows    | NULL       | Beijing    | 2          |
| windows    | NULL       | Shijiazhuang | 2          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

Example 2: Use CUBE to list all the possible combinations of columns, (os, device),(device, city), as grouping sets. Sample statement:

select os,device, city, count(*) 
from requests 
group by cube ((os, device), (device, city));
-- The preceding statement is equivalent to the following statement:
select os,device, city, count(*) 
from requests 
group by grouping sets ((os, device, city),(os, device),(device,city),());

The following result is returned:

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| NULL       | PC         | Beijing    | 3          |
| NULL       | PC         | Shijiazhuang | 1          |
| NULL       | Phone      | Beijing    | 1          |
| NULL       | Phone      | Shijiazhuang | 2          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

Example 3: Use ROLLUP to aggregate the os, device, and city columns in a hierarchical manner to generate multiple grouping sets. Sample statement:

select os,device, city, count(*)
from requests 
group by rollup (os, device, city);
-- The preceding statement is equivalent to the following statement:
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os),());

The following result is returned:

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| ios        | NULL       | NULL       | 1          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | NULL       | 2          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | NULL       | 4          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

Example 4: Use ROLLUP to aggregate os, (os,device), city in a hierarchical manner to generate multiple grouping sets. Sample statement:

select os,device, city, count(*)
from requests 
group by rollup (os, (os,device), city);
-- The preceding statement is equivalent to the following statement:
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os),());

The following result is returned:

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| ios        | NULL       | NULL       | 1          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | NULL       | 2          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | NULL       | 4          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

Example 5: Use GROUP BY, CUBE, and GROUPING SETS to generate multiple grouping sets. Sample statement:

select os,device, city, count(*)
from requests 
group by os, cube(os,device), grouping sets(city);
-- The preceding statement is equivalent to the following statement:
select os,device, city, count(*)
from requests 
group by grouping sets((os,device,city),(os,city),(os,device,city));

The following result is returned:

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| ios        | NULL       | Shijiazhuang | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | Beijing    | 2          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | Beijing    | 2          |
| windows    | NULL       | Shijiazhuang | 2          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

Example for using GROUPING and GROUPING_ID

Sample statement:

select a,b,c,count(*),
grouping(a) ga, grouping(b) gb, grouping(c) gc, grouping_id(a,b,c) groupingid 
from values (1,2,3) as t(a,b,c)
group by cube(a,b,c);

The following result is returned:

+------------+------------+------------+------------+------------+------------+------------+------------+
| a          | b          | c          | _c3        | ga         | gb         | gc         | groupingid |
+------------+------------+------------+------------+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 1          | 1          | 1          | 1          | 7          |
| NULL       | NULL       | 3          | 1          | 1          | 1          | 0          | 6          |
| NULL       | 2          | NULL       | 1          | 1          | 0          | 1          | 5          |
| NULL       | 2          | 3          | 1          | 1          | 0          | 0          | 4          |
| 1          | NULL       | NULL       | 1          | 0          | 1          | 1          | 3          |
| 1          | NULL       | 3          | 1          | 0          | 1          | 0          | 2          |
| 1          | 2          | NULL       | 1          | 0          | 0          | 1          | 1          |
| 1          | 2          | 3          | 1          | 0          | 0          | 0          | 0          |
+------------+------------+------------+------------+------------+------------+------------+------------+

By default, the columns that are not specified in GROUP BY are filled with NULL. You can use GROUPING to specify the values that you require. Sample statement that is based on the syntax of GROUPING SETS:

select
  if(grouping(os) == 0, os, 'ALL') as os,
  if(grouping(device) == 0, device, 'ALL') as device,
  if(grouping(city) == 0, city, 'ALL') as city, 
  count(*) as count 
from requests 
group by os, device, city grouping sets((os, device), (city), ());

The following result is returned:

+------------+------------+------------+------------+
| os         | device     | city       | count      |
+------------+------------+------------+------------+
| ALL        | ALL        | ALL        | 7          |
| ALL        | ALL        | Beijing    | 4          |
| ALL        | ALL        | Shijiazhuang | 3          |
| ios        | Phone      | ALL        | 1          |
| linux      | PC         | ALL        | 1          |
| linux      | Phone      | ALL        | 1          |
| windows    | PC         | ALL        | 3          |
| windows    | Phone      | ALL        | 1          |
+------------+------------+------------+------------+

Example for using GROUPING__ID:

Example for using GROUPING__ID without parameters specified:

select      
a, b, c, count(*), grouping__id 
from values (1,2,3) as t(a,b,c) 
group by a, b, c grouping sets ((a,b,c), (a));
-- The preceding statement is equivalent to the following statement:
select      
a, b, c, count(*), grouping_id(a,b,c)  
from values (1,2,3) as t(a,b,c) 
group by a, b, c grouping sets ((a,b,c), (a));

The following result is returned:

+------------+------------+------------+------------+------------+
| a          | b          | c          | _c3        | _c4        |
+------------+------------+------------+------------+------------+
| 1          | NULL       | NULL       | 1          | 3          |
| 1          | 2          | 3          | 1          | 0          |
+------------+------------+------------+------------+------------+

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Lingma

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)