Calculates the number of records that match the specified criteria.
Limits
Before you use window functions, take note of the following limits:
Window functions are supported only in
SELECT
statements.A window function cannot contain nested window functions or nested aggregate functions.
You cannot use window functions together with aggregate functions of the same level.
Syntax
-- Calculate the number of records in the specified column.
bigint count([distinct|all] <colname>)
-- Calculate the number of records in a window.
bigint count(*) over ([partition_clause] [orderby_clause] [frame_clause])
bigint count([distinct] <expr>[,...]) over ([partition_clause] [orderby_clause] [frame_clause])
Description
count(*)
: the total number of rows to return.count([distinct] <expr>[,...])
: When you calculate the number of rows, the rows whose expr is NULL are ignored. If multiple expr parameters exist, any rows whose expr is NULL are ignored. In addition, if the distinct keyword is specified, the number of rows after deduplication is calculated. Any rows whose expr is NULL are ignored.
Parameter description
distinct|all: optional. This parameter specifies whether to remove duplicates during the counting. The default value is all, which indicates that all records are counted. If this parameter is set to distinct, only records with distinct values are counted.
colname: required. The name of a column, which can be of any data type. The value of colname can be an asterisk (
*
).count(*)
indicates that the number of all rows is returned.expr: required. This parameter specifies the column whose values you want to count. All data types are supported. If the input value is NULL, the row that contains this value is not used for the calculation. If the DISTINCT keyword is specified, the count value of distinct values is obtained.
partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the BIGINT type is returned.
Sample data
create table if not exists emp
(empno bigint,
ename string,
job string,
mgr bigint,
hiredate datetime,
sal bigint,
comm bigint,
deptno bigint);
tunnel upload emp.txt emp;
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10
Examples
Example 1: Use the sal column to define a window. ORDER BY is not specified. This function returns the cumulative count from the first row to the last row in the current window. The current window includes the rows that have the same sal value. Sample statement:
select sal, count(sal) over (partition by sal) as count from emp;
The following result is returned:
+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 2 | -- This row is the first row of this window. The value of COUNT for the first row is the cumulative count for the second row because the two rows have the same sal value. | 1250 | 2 | -- The return value is the cumulative count from the first row to the second row in the current window. | 1300 | 2 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 2 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 2 | | 3000 | 2 | | 5000 | 2 | | 5000 | 2 | +------------+------------+
Example 2: In non-Hive-compatible mode, use the sal column to define a window. ORDER BY is specified. This function returns the cumulative count from the first row to the current row in the current window. The current window includes the rows that have the same sal value. Sample statement:
-- Disable the Hive-compatible mode. set odps.sql.hive.compatible=false; -- Execute the following statement: select sal, count(sal) over (partition by sal order by sal) as count from emp;
The following result is returned:
+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 1 | -- This row is the first row of this window. The cumulative count for the first row is 1. | 1250 | 2 | -- The cumulative count for the second row is 2. | 1300 | 1 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 1 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 1 | | 3000 | 2 | | 5000 | 1 | | 5000 | 2 | +------------+------------+
Example 3: In Hive-compatible mode, use the sal column to define a window. ORDER BY is specified. This function returns the cumulative count from the first row to the last row in the current window. The current window includes the rows that have the same sal value. Sample statement:
-- Enable the Hive-compatible mode. set odps.sql.hive.compatible=true; -- Execute the following statement: select sal, count(sal) over (partition by sal order by sal) as count from emp;
The following result is returned:
+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 2 | -- This row is the first row of this window. The value of COUNT for the first row is the cumulative count for the second row because the two rows have the same sal value. | 1250 | 2 | -- The return value is the cumulative count from the first row to the second row in the current window. | 1300 | 2 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 2 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 2 | | 3000 | 2 | | 5000 | 2 | | 5000 | 2 | +------------+------------+
Example 4: Calculate the total number of employees in all departments. Sample statement:
select count(*) from emp;
The following result is returned:
+------------+ | _c0 | +------------+ | 17 | +------------+
Example 5: Use this function with
GROUP BY
to group all employees by department (the deptno column) and calculate the number of employees in each department. Sample statement:select deptno, count(*) from emp group by deptno;
The following result is returned:
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 6 | | 20 | 5 | | 30 | 6 | +------------+------------+
Example 6: Remove duplicates when you calculate the number of departments. Sample statement:
select count(distinct deptno) from emp;
The following result is returned:
+------------+ | _c0 | +------------+ | 3 | +------------+
Related functions
COUNT is an aggregate function or a window function.
For more information about the functions that are used to calculate the average value of multiple input records and to aggregate parameters, see Aggregate functions.
For more information about the functions that are used to calculate the sum of data of columns in a window and to sort data, see Window functions.