Calculates the cumulative distribution of data in a partition.
Limits
Before you use window functions, take note of the following limits:
Window functions are supported only in
SELECT
statements.A window function cannot contain nested window functions or nested aggregate functions.
You cannot use window functions together with aggregate functions of the same level.
Syntax
double cume_dist() over([partition_clause] [orderby_clause])
Description
This function calculates the cumulative distribution. The cumulative distribution indicates the ratio of rows whose values are greater than or equal to the values of the current row to all rows in a partition. The ratio is determined based on orderby_clause.
Parameters
partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the DOUBLE type is returned. The return value is calculated by using the following formula: row_number_of_last_peer/partition_row_count
. row_number_of_last_peer
indicates the value returned by ROW_NUMBER that corresponds to the last row of the group to which the current row belongs. partition_row_count
indicates the number of rows in a partition to which the current row belongs.
Sample data
create table if not exists emp
(empno bigint,
ename string,
job string,
mgr bigint,
hiredate datetime,
sal bigint,
comm bigint,
deptno bigint);
tunnel upload emp.txt emp;
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10
Examples
Group all employees by department (deptno) and calculate the cumulative distribution of employees in each group by salary. Sample statement:
select deptno, ename, sal, concat(round(cume_dist() over (partition by deptno order by sal desc)*100,2),'%') as cume_dist from emp;
The following result is returned:
+------------+------------+------------+------------+
| deptno | ename | sal | cume_dist |
+------------+------------+------------+------------+
| 10 | JACCKA | 5000 | 33.33% |
| 10 | KING | 5000 | 33.33% |
| 10 | CLARK | 2450 | 66.67% |
| 10 | WELAN | 2450 | 66.67% |
| 10 | TEBAGE | 1300 | 100.0% |
| 10 | MILLER | 1300 | 100.0% |
| 20 | SCOTT | 3000 | 40.0% |
| 20 | FORD | 3000 | 40.0% |
| 20 | JONES | 2975 | 60.0% |
| 20 | ADAMS | 1100 | 80.0% |
| 20 | SMITH | 800 | 100.0% |
| 30 | BLAKE | 2850 | 16.67% |
| 30 | ALLEN | 1600 | 33.33% |
| 30 | TURNER | 1500 | 50.0% |
| 30 | MARTIN | 1250 | 83.33% |
| 30 | WARD | 1250 | 83.33% |
| 30 | JAMES | 950 | 100.0% |
+------------+------------+------------+------------+
Related functions
CUME_DIST is a window function. For more information about the functions that are used to calculate the sum of data of columns in a window and to sort data, see Window functions.