Calculates the average value of a column or the average value of expr in a window.
Limits
Before you use window functions, take note of the following limits:
Window functions are supported only in
SELECT
statements.A window function cannot contain nested window functions or nested aggregate functions.
You cannot use window functions together with aggregate functions of the same level.
Syntax
-- Calculate the average value of a column.
DECIMAL|DOUBLE avg(<colname>)
-- Calculate the average value of expr in a window.
double avg([distinct] double <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
decimal avg([distinct] decimal <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
Description
Calculates the average value of a column or the average value of expr in a window.
Parameters
colname: required. Column values support all data types and can be converted into the DOUBLE type before calculation.
expr: required. This parameter specifies the expression that is used to calculate the returned result. The value is of the DOUBLE or DECIMAL type.
If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
If the input value is NULL, the row that contains the value is not used for calculation.
If the distinct keyword is specified, the average value of distinct values is calculated.
partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value
If the input value of colname is null, the row that contains this value is not used for calculation. The following table describes the mappings between the data types of input data and return values.
Data type of input data
Data type of return values
TINYINT
DOUBLE
SMALLINT
DOUBLE
INT
DOUBLE
BIGINT
DOUBLE
FLOAT
DOUBLE
DOUBLE
DOUBLE
DECIMAL
DECIMAL
If the input value of expr is of the DECIMAL type, a value of the DECIMAL type is returned. If the input value is of another data type, a value of the DOUBLE type is returned. If the input value of expr is NULL, NULL is returned.
Sample data
This section provides sample source data and examples for you to understand how to use the functions. Create a table named emp and insert the sample data into the table. Sample statement:
create table if not exists emp
(empno bigint,
ename string,
job string,
mgr bigint,
hiredate datetime,
sal bigint,
comm bigint,
deptno bigint);
tunnel upload emp.txt emp;
The emp.txt file contains the following sample data:
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10
Examples
Example 1: Use the deptno column to define a window and calculate the average value of the sal column. ORDER BY is not specified. This function returns the cumulative average value of the values from the first row to the last row in the current window. The current window includes the rows that have the same deptno column value. Sample statement:
select deptno, sal, avg(sal) over (partition by deptno) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 2916.6666666666665 | -- This row is the first row of this window. The return value is the cumulative average value of the values from the first row to the sixth row. | 10 | 2450 | 2916.6666666666665 | -- The return value is the cumulative average value of the values from the first row to the sixth row. | 10 | 5000 | 2916.6666666666665 | -- The return value is the cumulative average value of the values from the first row to the sixth row. | 10 | 1300 | 2916.6666666666665 | | 10 | 5000 | 2916.6666666666665 | | 10 | 2450 | 2916.6666666666665 | | 20 | 3000 | 2175.0 | | 20 | 3000 | 2175.0 | | 20 | 800 | 2175.0 | | 20 | 1100 | 2175.0 | | 20 | 2975 | 2175.0 | | 30 | 1500 | 1566.6666666666667 | | 30 | 950 | 1566.6666666666667 | | 30 | 1600 | 1566.6666666666667 | | 30 | 1250 | 1566.6666666666667 | | 30 | 1250 | 1566.6666666666667 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+
Example 2: Disable the Hive-compatible data type edition and use the deptno column to define a window and calculate the average value of the sal column. ORDER BY is specified. This function returns the cumulative average value of the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statement:
-- Disable the Hive-compatible data type edition. set odps.sql.hive.compatible=false; -- Execute the following statement: select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300.0 | -- This row is the first row of this window. | 10 | 1300 | 1300.0 | -- The return value is the cumulative average value of the values in the first and second rows. | 10 | 2450 | 1683.3333333333333 | -- The return value is the cumulative average value of the values from the first row to the third row. | 10 | 2450 | 1875.0 | -- The return value is the cumulative average value of the values from the first row to the fourth row. | 10 | 5000 | 2500.0 | -- The return value is the cumulative average value of the values from the first row to the fifth row. | 10 | 5000 | 2916.6666666666665 | -- The return value is the cumulative average value of the values from the first row to the sixth row. | 20 | 800 | 800.0 | | 20 | 1100 | 950.0 | | 20 | 2975 | 1625.0 | | 20 | 3000 | 1968.75 | | 20 | 3000 | 2175.0 | | 30 | 950 | 950.0 | | 30 | 1250 | 1100.0 | | 30 | 1250 | 1150.0 | | 30 | 1500 | 1237.5 | | 30 | 1600 | 1310.0 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+
Example 3: Enable the Hive-compatible data type edition and use the deptno column to define a window and calculate the average value of the sal column. ORDER BY is specified. This function returns the cumulative average value of the values from the first row to the row that has the same sal value as the current row in the current window. The average values for the rows that have the same sal value are the same. The current window includes the rows that have the same deptno value. Sample statement:
-- Enable the Hive-compatible data type edition. set odps.sql.hive.compatible=true; -- Execute the following statement: select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300.0 | -- This row is the first row of this window. The average value for the first row is the cumulative average value of the values in the first and second rows because the two rows have the same sal value. | 10 | 1300 | 1300.0 | -- The return value is the cumulative average value of the values in the first and second rows. | 10 | 2450 | 1875.0 | -- The average value for the third row is the cumulative average value of the values from the first row to the fourth row because the third and fourth rows have the same sal value. | 10 | 2450 | 1875.0 | -- The return value is the cumulative average value of the values from the first row to the fourth row. | 10 | 5000 | 2916.6666666666665 | | 10 | 5000 | 2916.6666666666665 | | 20 | 800 | 800.0 | | 20 | 1100 | 950.0 | | 20 | 2975 | 1625.0 | | 20 | 3000 | 2175.0 | | 20 | 3000 | 2175.0 | | 30 | 950 | 950.0 | | 30 | 1250 | 1150.0 | | 30 | 1250 | 1150.0 | | 30 | 1500 | 1237.5 | | 30 | 1600 | 1310.0 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+
Example 4: Calculate the average salary (sal) of all employees. Sample statement:
select avg(sal) from emp;
The following result is returned:
+------------+ | _c0 | +------------+ | 2222.0588235294117 | +------------+
Example 5: Use this function with
GROUP BY
to group all employees by department (deptno) and calculate the average salary (sal) values of employees in each department. Sample statement:select deptno, avg(sal) from emp group by deptno;
The following result is returned:
+------------+------------+ | deptno | _c1 | +------------+------------+ | 10 | 2916.6666666666665 | | 20 | 2175.0 | | 30 | 1566.6666666666667 | +------------+------------+
Related functions
AVG is an aggregate function or a window function.
For more information about the functions that are used to calculate the average value of multiple input records and to aggregate parameters, see Aggregate functions.
For more information about the functions that are used to calculate the sum of data of columns in a window and to sort data, see Window functions.