General-purpose aggregate functions supported by Hologres - Hologres

Hologres is compatible with PostgreSQL and allows you to use the standard PostgreSQL syntax for data development. This topic describes the general-purpose aggregate functions supported by Hologres.

The functions supported by Hologres are a subset of the PostgreSQL functions. For more information about how to use these functions, see Aggregate Functions in the PostgreSQL documentation. The following table describes the general-purpose aggregate functions supported by Hologres.

Function	Description
APPROX_COUNT_DISTINCT	Calculates the number of rows that have distinct values in a column. The result of this function is an approximate value.
ARRAY_AGG	Aggregates the values of an expression into an array.
AVG	Calculates the average value of the non-NULL values returned by a BIGINT, FLOAT8, FLOAT4, or INT expression.
BIT_AND	Performs bitwise AND operations on the non-NULL values returned by a BIGINT or an INT expression.
BIT_OR	Performs bitwise OR operations on the non-NULL values returned by a BIGINT or an INT expression.
BOOL_AND	Performs AND operations on a BOOLEAN expression to check whether all values returned by the expression are true.
BOOL_OR	Performs OR operations on a BOOLEAN expression to check whether one or more values returned by the expression are true.
CORR	Calculates the correlation coefficient between two variables.
COUNT	Calculates the number of rows in a table or the number of rows defined by a BIGINT or NUMERIC expression.
COVAR_POP	Calculates the population covariance between two variables.
COVAR_SAMP	Calculates the sample covariance between two variables.
EVERY	Performs AND operations on a BOOLEAN expression to check whether all values returned by the expression are true.
MAX	Calculates the maximum value returned by a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
MAX_BY	Finds the row in which the maximum value of a column (column y) is included and returns the value of another column (column x) in the same row.
MIN	Calculates the minimum value returned by a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
MIN_BY	Finds the row in which the minimum value of a column (column y) is included and returns the value of another column (column x) in the same row.
REGR_AVGX	Calculates the average value of an independent variable (x).
REGR_AVGY	Calculates the average value of a dependent variable (y).
REGR_COUNT	Calculates the number of rows in which the values of both input parameters are not NULL.
REGR_R2	Calculates the square of the correlation coefficient.
REGR_INTERCEPT	Calculates the intercept on the y-axis of the linear regression line that best fits the dependent and independent variables.
REGR_SLOPE	Calculates the slope of the linear regression line that best fits the dependent and independent variables.
REGR_SXX	Calculates the sum of the squares of an independent variable (x) by using the following formula: `sum(X^2) - sum(X)^2/N`.
REGR_SXY	Calculates the sum of the products of a dependent variable (x) and an independent variable (y) by using the following formula: `sum(XY) - sum(X) sum(Y)/N`.
SUM	Calculates the sum of the values returned by a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
STDDEV	Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
STDDEV_POP	Calculates the population standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
STDDEV_SAMP	Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
STRING_AGG	Concatenates the non-NULL values returned by a TEXT expression into a string by using a delimiter.
UNIQ	Calculates the number of rows that have distinct values in a column.
VARIANCE	Calculates the sample variance of the values returned by an INT or a NUMERIC expression.
VAR_POP	Calculates the population variance of the values returned by a FLOAT8, INT, or NUMERIC expression.
VAR_SAMP	Calculates the sample variance of the values returned by an INT, FLOAT8 or NUMERIC expression.

Sample data

-- Create a table.
CREATE TABLE example_table(
    c1 INT,           
    c2 BOOLEAN,      
    c3 VARCHAR(10),  
    c4 DECIMAL(3, 1),
    c5 TEXT,
    c6 FLOAT8,
    c7 FLOAT8
);

-- Insert data into the table.
INSERT INTO example_table (c1, c2, c3, c4, c5, c6, C7) VALUES
(1, true, 'a', 1.1, 3, 6, 7),
(2, false, 'b', 2.2, 4, 6, 2.6);

APPROX_COUNT_DISTINCT

Description: Calculates the number of rows that have distinct values in a column. The result of this function is an approximate value.
```
APPROX_COUNT_DISTINCT ( <column> )
```
Parameter description
column: required. The column in which you want to calculate the approximate number of rows that have distinct values.
Usage notes
The APPROX_COUNT_DISTINCT function uses the HyperLogLog algorithm to perform an inaccurate COUNT DISTINCT operation. An inaccurate COUNT DISTINCT operation helps improve query performance especially when a large number of discrete values exist in the column. This ensures that the average margin of error is in the range of 0.1% to 1%. This function is suitable for scenarios in which high query performance is required and an approximate result is acceptable. To adjust the margin of error, you can execute the following statement to change the value of the hg_experimental_approx_count_distinct_precision parameter:
```
SET hg_experimental_approx_count_distinct_precision = 20;
```
- Valid values: [12,20]. Default value: 17.
- This parameter specifies the number of bits used as the index of buckets in the HyperLogLog algorithm. A greater value indicates more buckets and higher accuracy.
- A greater value indicates that more computing time and memory are consumed. However, the overhead is still much smaller than the overhead of the COUNT DISTINCT (column) function. Therefore, we recommend that you use the APPROX_COUNT_DISTINCT function rather than the COUNT DISTINCT ( column ) function.
- If you set this parameter to a value greater than 17, Hologres uses the HyperLogLog++ algorithm to correct the error of the returned result in order to reduce the margin of error.
  For example, if you set the hg_experimental_approx_count_distinct_precision parameter to 20, the margin of error can be reduced to 0.01% to 0.2% in most cases.
Note
You can also perform an accurate COUNT DISTINCT operation by using the COUNT DISTINCT (column) function, which consumes more resources.

Examples

In the following examples, the ORDERS table in the public TPC-H dataset of MaxCompute is used. You need to only execute the related SQL statements. No additional preparations are required.

-- Create a foreign table.
IMPORT FOREIGN SCHEMA public_data
LIMIT TO (ORDERS_10g)
FROM SERVER odps_server INTO public options (if_table_exist 'update');

-- Create an internal table.
BEGIN;
CREATE TABLE ORDERS (
    O_ORDERKEY bigint NOT NULL PRIMARY KEY,
    O_CUSTKEY int NOT NULL,
    O_ORDERSTATUS text NOT NULL,
    O_TOTALPRICE DECIMAL(12, 2) NOT NULL,
    O_ORDERDATE timestamptz NOT NULL,
    O_ORDERPRIORITY text NOT NULL,
    O_CLERK text NOT NULL,
    O_SHIPPRIORITY int NOT NULL,
    O_COMMENT text NOT NULL
);

CALL set_table_property ('ORDERS', 'segment_key', 'O_ORDERDATE');
CALL set_table_property ('ORDERS', 'distribution_key', 'O_ORDERKEY');
CALL set_table_property ('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
CALL set_table_property ('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');

COMMENT ON COLUMN ORDERS.O_ORDERKEY IS 'Order ID';
COMMENT ON COLUMN ORDERS.O_CUSTKEY IS 'Customer ID';
COMMENT ON COLUMN ORDERS.O_ORDERSTATUS IS 'Order status';
COMMENT ON COLUMN ORDERS.O_TOTALPRICE IS 'Total price';
COMMENT ON COLUMN ORDERS.O_ORDERDATE IS 'Order date';
COMMENT ON COLUMN ORDERS.O_ORDERPRIORITY IS 'Order priority';
COMMENT ON COLUMN ORDERS.O_CLERK IS 'Cashier';
COMMENT ON COLUMN ORDERS.O_SHIPPRIORITY IS 'Shipping priority';
COMMENT ON COLUMN ORDERS.O_COMMENT IS 'Comment';

COMMIT;

-- Import data to the internal table.
INSERT INTO ORDERS SELECT * FROM ORDERS_10g;

Execute the following statement to calculate the approximate number of rows that have distinct values in the O_CUSTKEY column:
```
SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;
```
The following result is returned:
```
approx_count_distinct
-----------------------
               1000422
```
Execute the following statements to set the hg_experimental_approx_count_distinct_precision parameter to 20 for all sessions and calculate the approximate number of rows that have distinct values in the O_CUSTKEY column:
```
ALTER DATABASE dbname SET hg_experimental_approx_count_distinct_precision = 20;
SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;
```
The following result is returned:
```
approx_count_distinct
-----------------------
               1000422
```

Execute the following statements to set the hg_experimental_approx_count_distinct_precision parameter to 20 for the current session and calculate the approximate number of rows that have distinct values in the O_CUSTKEY column:

-- Set the hg_experimental_approx_count_distinct_precision parameter to 20 for the current session.
SET hg_experimental_approx_count_distinct_precision = 20;
SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;

The following result is returned:

 approx_count_distinct
-----------------------
                998854

ARRAY_AGG

Description: Concatenates the values returned by an expression into an array.
```
ARRAY_AGG(expression)
```
Parameter description
expression: required. The expression in a format other than the following types: JSON, JSONB, TIMETZ, INTERVAL, INET, OID, UUID, and ARRAY.
Return value
A value of the ARRAY type is returned.

Examples

SELECT ARRAY_AGG(c1) FROM example_table;

The following result is returned:

 array_agg
-----------
 {2,1}

AVG

Description: Calculates the average value of the non-NULL values returned by a BIGINT, FLOAT8, FLOAT4, or INT expression.
```
AVG(expression)
```
Parameter description
expression: required. The BIGINT, FLOAT8, FLOAT4, or INT expression.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT AVG(c7) FROM example_table;

The following result is returned:

   avg
----------
 1.500000

BIT_AND

Description: Performs bitwise AND operations on the non-NULL values returned by a BIGINT or an INT expression.
```
BIT_AND(expression)
```
Parameter description
expression: required. The BIGINT or INT expression.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT BIT_AND(c1) FROM example_table;

The following result is returned:

 bit_and
---------
       0

BIT_OR

Description: Performs bitwise OR operations on the non-NULL values of a BIGINT or an INT expression.
```
BIT_OR(expression)
```
Parameter description
expression: required. The BIGINT or INT expression.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT BIT_OR(c1) FROM example_table;

The following result is returned:

 bit_or
--------
      3

BOOL_AND

Description: Performs AND operations on a BOOLEAN expression to check whether all values returned by the expression are true.
```
BOOL_AND(bool)
```
Parameter description
bool: required. The BOOLEAN expression.
Return value
A value of the BOOLEAN type is returned. This function returns TRUE (t) if all values returned by the expression are true and returns FALSE (f) if one or more values are false.

Examples

SELECT BOOL_AND(c2) FROM example_table;

The following result is returned:

 bool_and
----------
 f

BOOL_OR

Description: Performs OR operations on a BOOLEAN expression to check whether one or more values returned by the expression are true.
```
BOOL_OR(bool)
```
Parameter description
bool: required. The BOOLEAN expression.
Return value
A value of the BOOLEAN type is returned. This function returns TRUE (t) if one or more values returned by the expression are true and returns FALSE (f) if no value is true.

Examples

SELECT BOOL_OR(c2) FROM example_table;

The following result is returned:

 bool_or
---------
 t

CORR

Description: Calculates the correlation coefficient between two variables.
```
CORR(Y, X)
```
Note
Only parameters of the DOUBLE PRECISION, FLOAT, or FLOAT8 type are supported.
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT CORR(c6, c7) FROM example_table;

The following result is returned:

 corr
------

COUNT

Description: Calculates the number of rows in a table or the number of rows defined by a BIGINT or NUMERIC expression.
- Execute the following statement to calculate the number of rows in a table:
```
COUNT(*)
```
- Execute the following statement to calculate the number of rows defined by a BIGINT or NUMERIC expression:
```
COUNT(expression)
```
  Parameter description: required. This function ignores NULL values.
Return value
A value of the BIGINT type is returned.

Examples

SELECT COUNT(*) FROM example_table;

The following result is returned:

 count
-------
     2

COVAR_POP

Description: Calculates the population covariance between two variables.
```
COVAR_POP(Y, X)
```
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT COVAR_POP(c6, c7) FROM example_table;

The following result is returned:

      covar_pop
---------------------
 3.5527136788005e-15

COVAR_SAMP

Description: Calculates the sample covariance between two variables.
```
COVAR_SAMP(Y, X)
```
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT COVAR_SAMP(c6, c7) FROM example_table;

The following result is returned:

     covar_samp
--------------------
 7.105427357601e-15

EVERY

Description: Performs AND operations on a BOOLEAN expression to check whether all values returned by the expression are true.
```
EVERY(bool)
```
Parameter description
bool: required. The BOOLEAN expression.
Return value
A value of the BOOLEAN type is returned. This function returns TRUE (t) if all values returned by the expression are true and returns FALSE (f) if one or more values are false.

Examples

SELECT EVERY(c2) FROM example_table;

The following result is returned:

 every
-------
 f

MAX

Description: Calculates the maximum value of a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
```
MAX(expression)
```
Parameter description
expression: required. The BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT MAX(c1) FROM example_table;

The following result is returned:

 max
-----
   2

MAX_BY

Description: Finds the row in which the maximum value of a column (column y) is included and returns the value of another column (column x) in the same row.
```
MAX_BY(x, y);
```
Note
Values of the numeric type are sorted based on the numeric values, and values of non-numeric types are sorted in alphabetical order.
Parameter description
- y: required. The name of a column whose maximum value you want to obtain. NULL values in column y are not involved in the calculation.
- x: required. The name of another column whose value you want to query.
Usage notes
Only Hologres V1.3.36 and later support the MAX_BY function.
Note
If the version of your Hologres instance is earlier than V1.3.36, join the Hologres DingTalk group to apply for an upgrade. For more information, see Obtain online support for Hologres.
Return value
- If the maximum value of column y is found in only one row, the value of column x in the same row is returned.
- If all values in column y are NULL, NULL is returned.
- If the maximum value of column y is found in multiple rows, the maximum value of the values in column x in the same rows is returned.

Examples

-- Sample data
DROP TABLE IF EXISTS test;
CREATE TABLE IF NOT EXISTS test (
    id INT,
    name TEXT,
    cost INT
);
INSERT INTO test
    VALUES (1, 'a', 100), (1, 'aa', 200), (1, 'aaa', 300), (2, 'b', 150), (2, 'bb', 300), (3, 'c', 150), (3, 'cc', 50);

Execute the following statement to find the rows in which the maximum value of the cost column is included and return the maximum value of the values in the name column in the same rows:

SELECT max_by(name, cost) FROM test;

The following result is returned:

 max_by
--------
 aaa

MIN

Description: Calculates the minimum value of a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
```
MIN(expression)
```
Parameter description
expression: required. The BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT MIN(c1) FROM example_table;

The following result is returned:

 min
-----
   1

MIN_BY

Description: Finds the row in which the minimum value of a column (column y) is included and returns the value of another column (column x) in the same row.
```
MIX_BY(x, y);
```
Note
Values of the numeric type are sorted based on the numeric values, and values of non-numeric types are sorted in alphabetical order.
Parameter description
- y: required. The name of a column whose minimum value you want to obtain. NULL values in column y are not involved in the calculation.
- x: required. The name of another column whose value you want to query.
Usage notes
Only Hologres V1.3.36 and later support the MAX_BY and MIN_BY functions.
Note
If the version of your Hologres instance is earlier than V1.3.36, join the Hologres DingTalk group to apply for an upgrade. For more information, see Obtain online support for Hologres.
Return value
- If the minimum value of column y is found in only one row, the value of column x in the same row is returned.
- If all values in column y are NULL, NULL is returned.
- If the minimum value of column y is found in multiple rows, the maximum value of the values in column x in the same rows is returned.

Examples

-- Sample data
DROP TABLE IF EXISTS test;
CREATE TABLE IF NOT EXISTS test (
    id INT,
    name TEXT,
    cost INT
);
INSERT INTO test
    VALUES (1, 'a', 100), (1, 'aa', 200), (1, 'aaa', 300), (2, 'b', 150), (2, 'bb', 300), (3, 'c', 150), (3, 'cc', 50);

Execute the following statement to find the row in which the minimum value of the cost column is included and return the value of the name column in the same row:

SELECT MIX_BY(name, cost) FROM test;

The following result is returned:

min_by
--------
 cc
(1 row)

REGR_AVGX

Description: Calculates the average value of an independent variable (x).
```
REGR_AVGX(Y, X)
```
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT REGR_AVGX(c6, c7) FROM example_table;

The following result is returned:

 regr_avgx
-----------
       4.8

REGR_AVGY

Description: Calculates the average value of a dependent variable (y).
```
REGR_AVGY(Y, X)
```
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT REGR_AVGY(c6, c7) FROM example_table;

The following result is returned:

 regr_avgy
-----------
         6

REGR_COUNT

Description: Calculates the number of rows in which the values of both input parameters are not NULL.
```
REGR_COUNT(Y, X)
```
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the BIGINT type is returned.

Examples

SELECT REGR_COUNT(c6, c7) FROM example_table;

The following result is returned:

 regr_count
------------
          2

REGR_R2

Description: Calculates the square of the correlation coefficient.
```
REGR_R2(Y, X)
```
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT REGR_R2(c6, c7) FROM example_table;

The following result is returned:

 regr_r2
---------
       1

REGR_INTERCEPT

Description: Calculates the intercept on the y-axis of the linear regression line that best fits the dependent and independent variables.
```
REGR_INTERCEPT(Y, X)
```
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT REGR_INTERCEPT(c6, c7) FROM example_table;

The following result is returned:

 regr_intercept
----------------
              6

REGR_SLOPE

Description: Calculates the slope of the linear regression line that best fits the dependent and independent variables.
```
REGR_SLOPE(Y, X)
```
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT REGR_SLOPE(c6, c7) FROM example_table;

The following result is returned:

      regr_slope
----------------------
 7.34031751818285e-16

REGR_SXX

Description: Calculates the sum of the squares of an independent variable (x) by using the following formula: sum(X^2) - sum(X)^2/N.
```
REGR_SXX(Y, X)
```
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT REGR_SXX(c6, c7) FROM example_table;

The following result is returned:

 regr_sxx
----------
     9.68

REGR_SXY

Description: Calculates the sum of the products of a dependent variable (x) and an independent variable (y) by using the following formula: sum(X*Y) - sum(X) * sum(Y)/N.
```
REGR_SXY(Y, X)
```
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT REGR_SXY(c6, c7) FROM example_table;

The following result is returned:

      regr_sxy
--------------------
 7.105427357601e-15

SUM

Description: Calculates the sum of the values returned by a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
```
SUM(expression)
```
Parameter description
expression: required. The BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
Return value
A value of the data type that is consistent with the input parameter type is returned.

Examples

SELECT SUM(c1) FROM example_table;

The following result is returned:

 sum
-----
   3

STDDEV

Description: Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
```
STDDEV(expression)
```
Parameter description
expression: required. The INT, NUMERIC, or FLOAT8 expression.
Return value
A value of the NUMERIC or FLOAT8 type is returned.

Examples

SELECT STDDEV(c1) FROM example_table;

The following result is returned:

    stddev
--------------
 0.7071067811

STDDEV_POP

Description: Calculates the population standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
```
STDDEV_POP(expression)	
```
Parameter description
expression: required. The INT, NUMERIC, or FLOAT8 expression.
Return value
A value of the NUMERIC or FLOAT8 type is returned.

Examples

SELECT STDDEV_POP(c1) FROM example_table;

The following result is returned:

  stddev_pop
--------------
 0.5000000000

STDDEV_SAMP

Description: Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
```
STDDEV_SAMP(expression)	
```
Parameter description
expression: required. The INT, NUMERIC, or FLOAT8 expression.
Return value
A value of the NUMERIC or FLOAT8 type is returned.

Examples

SELECT STDDEV_SAMP(c1) FROM example_table;

The following result is returned:

 stddev_samp
--------------
 0.7071067812

STRING_AGG

Description: Concatenates the non-NULL values of a TEXT expression into a string by using a delimiter.
```
STRING_AGG(<expression> TEXT, <delimiter> TEXT)
```
Parameter description
- expression: required. The original string.
- delimiter: required. The delimiter that you want to use to split the string.
Return value
A value of the TEXT type is returned.

Examples

SELECT STRING_AGG(c5, '-') FROM example_table;

The following result is returned:

 string_agg
------------
 3-4

UNIQ

Description: Calculates the number of rows that have distinct values in a column.
```
UNIQ ( < column > );
```
Parameter description
column: required. The column for which you want to calculate the number of rows that have distinct values. The following data types are supported: SMALLINT, INTEGER, BIGINT, REAL, DOUBLE PRECISION, TEXT, VARCHAR, TIMESTAMP, TIMESTAMPTZ, DATE, TIMETZ, and UUID.
Usage notes
- Only Hologres V1.3 and later support the UNIQ function.
  Note
  If the version of your Hologres instance is earlier than V1.3, you can manually upgrade your Hologres instance in the Hologres console or join the Hologres DingTalk group to contact Hologres technical support. For more information about how to manually upgrade your Hologres instance in the Hologres console, see Manual upgrade (beta). For more information about how to obtain technical support, see Obtain online support for Hologres.
- The UNIQ function delivers better performance than the COUNT DISTINCT function only if the SQL statement contains the GROUP BY clause and values of the field based on which the GROUP BY operation is performed are evenly distributed.
- In most cases, the UNIQ function has better performance than the COUNT DISTINCT function if the GROUP BY key has high cardinality. In addition, the UNIQ function occupies less memory space than the COUNT DISTINCT function. If the memory space is insufficient for using the COUNT DISTINCT function, you can use the UNIQ function instead.
- The performance of the COUNT DISTINCT function is optimized in Hologres V2.1 and later for various scenarios, such as scenarios where one or more COUNT DISTINCT functions are used, data skew occurs, or no GROUP BY clause is used. In these scenarios in Hologres V2.1 or later, you do not need to use the UNIQ function because the COUNT DISTINCT function delivers high performance. For more information, see the "Optimize the COUNT DISTINCT function" section in Optimize performance of queries on Hologres internal tables.

Examples

In the following examples, the ORDERS table in the public TPC-H dataset of MaxCompute is used. You need to only execute the related SQL statements. No additional preparations are required.

-- Create a foreign table.
IMPORT FOREIGN SCHEMA public_data
LIMIT TO (ORDERS_10g)
FROM SERVER odps_server INTO public options (if_table_exist 'update');

-- Create an internal table.
BEGIN;
CREATE TABLE ORDERS (
    O_ORDERKEY bigint NOT NULL PRIMARY KEY,
    O_CUSTKEY int NOT NULL,
    O_ORDERSTATUS text NOT NULL,
    O_TOTALPRICE DECIMAL(12, 2) NOT NULL,
    O_ORDERDATE timestamptz NOT NULL,
    O_ORDERPRIORITY text NOT NULL,
    O_CLERK text NOT NULL,
    O_SHIPPRIORITY int NOT NULL,
    O_COMMENT text NOT NULL
);

CALL set_table_property ('ORDERS', 'segment_key', 'O_ORDERDATE');
CALL set_table_property ('ORDERS', 'distribution_key', 'O_ORDERKEY');
CALL set_table_property ('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
CALL set_table_property ('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');

COMMENT ON COLUMN ORDERS.O_ORDERKEY IS 'Order ID';
COMMENT ON COLUMN ORDERS.O_CUSTKEY IS 'Customer ID';
COMMENT ON COLUMN ORDERS.O_ORDERSTATUS IS 'Order status';
COMMENT ON COLUMN ORDERS.O_TOTALPRICE IS 'Total price';
COMMENT ON COLUMN ORDERS.O_ORDERDATE IS 'Order date';
COMMENT ON COLUMN ORDERS.O_ORDERPRIORITY IS 'Order priority';
COMMENT ON COLUMN ORDERS.O_CLERK IS 'Cashier';
COMMENT ON COLUMN ORDERS.O_SHIPPRIORITY IS 'Shipping priority';
COMMENT ON COLUMN ORDERS.O_COMMENT IS 'Comment';

COMMIT;

-- Import data to the internal table.
INSERT INTO ORDERS SELECT * FROM ORDERS_10g;

Execute the following statement to calculate the number of rows that have distinct values in the O_CUSTKEY column of the ORDERS table:
```
SELECT UNIQ ( O_CUSTKEY ) FROM ORDERS;
```
The following result is returned:
```
  uniq
--------
 999982
```
Execute the following statement to group data in the ORDERS table by O_ORDERSTATUS and calculate the number of rows that have distinct values in the O_CUSTKEY column in each order state:
```
SELECT O_ORDERSTATUS, UNIQ ( O_CUSTKEY ) FROM ORDERS GROUP BY O_ORDERSTATUS;
```
The following result is returned:
```
 o_orderstatus |  uniq
---------------+--------
 P             | 313478
 F             | 996258
 O             | 996400
```

VARIANCE

Description: Calculates the sample variance of the values returned by an INT or a NUMERIC expression.
```
VARIANCE(expression)
```
Parameter description
expression: required. The INT or NUMERIC expression.
Return value
A value of the NUMERIC type is returned.

Examples

SELECT VARIANCE(c1) FROM example_table;

The following result is returned:

   variance
--------------
 0.5000000000

VAR_POP

Description: Calculates the population variance of the values returned by a FLOAT8, INT, or NUMERIC expression.
```
VAR_POP(expression)
```
Parameter description
expression: required. The FLOAT8, INT, or NUMERIC expression.
Return value
A value of the NUMERIC or FLOAT8 type is returned.

Examples

SELECT VAR_POP(c1) FROM example_table;

The following result is returned:

   var_pop
--------------
 0.2500000000

VAR_SAMP

Description: Calculates the sample variance of the values returned by an INT, FLOAT8 or NUMERIC expression.
```
VAR_SAMP(expression)
```
Parameter description
expression: required. The INT, FLOAT8, or NUMERIC expression.
Return value
A value of the NUMERIC or FLOAT8 type is returned.

Examples

SELECT VAR_SAMP(c1) FROM example_table;

The following result is returned:

   var_samp
--------------
 0.5000000000