All Products
Search
Document Center

Hologres:General-purpose aggregate functions supported by Hologres

Last Updated:Feb 10, 2025

Hologres is compatible with PostgreSQL and allows you to use the standard PostgreSQL syntax for data development. This topic describes the general-purpose aggregate functions supported by Hologres.

The functions supported by Hologres are a subset of the PostgreSQL functions. For more information about how to use these functions, see Aggregate Functions in the PostgreSQL documentation. The following table describes the general-purpose aggregate functions supported by Hologres.

Function

Description

APPROX_COUNT_DISTINCT

Calculates the number of rows that have distinct values in a column. The result of this function is an approximate value.

ARRAY_AGG

Aggregates the values of an expression into an array.

AVG

Calculates the average value of the non-NULL values returned by a BIGINT, FLOAT8, FLOAT4, or INT expression.

BIT_AND

Performs bitwise AND operations on the non-NULL values returned by a BIGINT or an INT expression.

BIT_OR

Performs bitwise OR operations on the non-NULL values returned by a BIGINT or an INT expression.

BOOL_AND

Performs AND operations on a BOOLEAN expression to check whether all values returned by the expression are true.

BOOL_OR

Performs OR operations on a BOOLEAN expression to check whether one or more values returned by the expression are true.

CORR

Calculates the correlation coefficient between two variables.

COUNT

Calculates the number of rows in a table or the number of rows defined by a BIGINT or NUMERIC expression.

COVAR_POP

Calculates the population covariance between two variables.

COVAR_SAMP

Calculates the sample covariance between two variables.

EVERY

Performs AND operations on a BOOLEAN expression to check whether all values returned by the expression are true.

MAX

Calculates the maximum value returned by a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

MAX_BY

Finds the row in which the maximum value of a column (column y) is included and returns the value of another column (column x) in the same row.

MIN

Calculates the minimum value returned by a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

MIN_BY

Finds the row in which the minimum value of a column (column y) is included and returns the value of another column (column x) in the same row.

REGR_AVGX

Calculates the average value of an independent variable (x).

REGR_AVGY

Calculates the average value of a dependent variable (y).

REGR_COUNT

Calculates the number of rows in which the values of both input parameters are not NULL.

REGR_R2

Calculates the square of the correlation coefficient.

REGR_INTERCEPT

Calculates the intercept on the y-axis of the linear regression line that best fits the dependent and independent variables.

REGR_SLOPE

Calculates the slope of the linear regression line that best fits the dependent and independent variables.

REGR_SXX

Calculates the sum of the squares of an independent variable (x) by using the following formula: sum(X^2) - sum(X)^2/N.

REGR_SXY

Calculates the sum of the products of a dependent variable (x) and an independent variable (y) by using the following formula: sum(X*Y) - sum(X) * sum(Y)/N.

SUM

Calculates the sum of the values returned by a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

STDDEV

Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.

STDDEV_POP

Calculates the population standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.

STDDEV_SAMP

Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.

STRING_AGG

Concatenates the non-NULL values returned by a TEXT expression into a string by using a delimiter.

UNIQ

Calculates the number of rows that have distinct values in a column.

VARIANCE

Calculates the sample variance of the values returned by an INT or a NUMERIC expression.

VAR_POP

Calculates the population variance of the values returned by a FLOAT8, INT, or NUMERIC expression.

VAR_SAMP

Calculates the sample variance of the values returned by an INT, FLOAT8 or NUMERIC expression.

Sample data

-- Create a table.
CREATE TABLE example_table(
    c1 INT,           
    c2 BOOLEAN,      
    c3 VARCHAR(10),  
    c4 DECIMAL(3, 1),
    c5 TEXT,
    c6 FLOAT8,
    c7 FLOAT8
);

-- Insert data into the table.
INSERT INTO example_table (c1, c2, c3, c4, c5, c6, C7) VALUES
(1, true, 'a', 1.1, 3, 6, 7),
(2, false, 'b', 2.2, 4, 6, 2.6);

APPROX_COUNT_DISTINCT

  • Description: Calculates the number of rows that have distinct values in a column. The result of this function is an approximate value.

    APPROX_COUNT_DISTINCT ( <column> )
  • Parameter description

    column: required. The column in which you want to calculate the approximate number of rows that have distinct values.

  • Usage notes

    The APPROX_COUNT_DISTINCT function uses the HyperLogLog algorithm to perform an inaccurate COUNT DISTINCT operation. An inaccurate COUNT DISTINCT operation helps improve query performance especially when a large number of discrete values exist in the column. This ensures that the average margin of error is in the range of 0.1% to 1%. This function is suitable for scenarios in which high query performance is required and an approximate result is acceptable. To adjust the margin of error, you can execute the following statement to change the value of the hg_experimental_approx_count_distinct_precision parameter:

    SET hg_experimental_approx_count_distinct_precision = 20;
    • Valid values: [12,20]. Default value: 17.

    • This parameter specifies the number of bits used as the index of buckets in the HyperLogLog algorithm. A greater value indicates more buckets and higher accuracy.

    • A greater value indicates that more computing time and memory are consumed. However, the overhead is still much smaller than the overhead of the COUNT DISTINCT (column) function. Therefore, we recommend that you use the APPROX_COUNT_DISTINCT function rather than the COUNT DISTINCT ( column ) function.

    • If you set this parameter to a value greater than 17, Hologres uses the HyperLogLog++ algorithm to correct the error of the returned result in order to reduce the margin of error.

      For example, if you set the hg_experimental_approx_count_distinct_precision parameter to 20, the margin of error can be reduced to 0.01% to 0.2% in most cases.

    Note

    You can also perform an accurate COUNT DISTINCT operation by using the COUNT DISTINCT (column) function, which consumes more resources.

  • Examples

    In the following examples, the ORDERS table in the public TPC-H dataset of MaxCompute is used. You need to only execute the related SQL statements. No additional preparations are required.

    -- Create a foreign table.
    IMPORT FOREIGN SCHEMA public_data
    LIMIT TO (ORDERS_10g)
    FROM SERVER odps_server INTO public options (if_table_exist 'update');
    
    -- Create an internal table.
    BEGIN;
    CREATE TABLE ORDERS (
        O_ORDERKEY bigint NOT NULL PRIMARY KEY,
        O_CUSTKEY int NOT NULL,
        O_ORDERSTATUS text NOT NULL,
        O_TOTALPRICE DECIMAL(12, 2) NOT NULL,
        O_ORDERDATE timestamptz NOT NULL,
        O_ORDERPRIORITY text NOT NULL,
        O_CLERK text NOT NULL,
        O_SHIPPRIORITY int NOT NULL,
        O_COMMENT text NOT NULL
    );
    
    CALL set_table_property ('ORDERS', 'segment_key', 'O_ORDERDATE');
    CALL set_table_property ('ORDERS', 'distribution_key', 'O_ORDERKEY');
    CALL set_table_property ('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
    CALL set_table_property ('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');
    
    COMMENT ON COLUMN ORDERS.O_ORDERKEY IS 'Order ID';
    COMMENT ON COLUMN ORDERS.O_CUSTKEY IS 'Customer ID';
    COMMENT ON COLUMN ORDERS.O_ORDERSTATUS IS 'Order status';
    COMMENT ON COLUMN ORDERS.O_TOTALPRICE IS 'Total price';
    COMMENT ON COLUMN ORDERS.O_ORDERDATE IS 'Order date';
    COMMENT ON COLUMN ORDERS.O_ORDERPRIORITY IS 'Order priority';
    COMMENT ON COLUMN ORDERS.O_CLERK IS 'Cashier';
    COMMENT ON COLUMN ORDERS.O_SHIPPRIORITY IS 'Shipping priority';
    COMMENT ON COLUMN ORDERS.O_COMMENT IS 'Comment';
    
    COMMIT;
    
    -- Import data to the internal table.
    INSERT INTO ORDERS SELECT * FROM ORDERS_10g;
    • Execute the following statement to calculate the approximate number of rows that have distinct values in the O_CUSTKEY column:

      SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;

      The following result is returned:

      approx_count_distinct
      -----------------------
                     1000422
    • Execute the following statements to set the hg_experimental_approx_count_distinct_precision parameter to 20 for all sessions and calculate the approximate number of rows that have distinct values in the O_CUSTKEY column:

      ALTER DATABASE dbname SET hg_experimental_approx_count_distinct_precision = 20;
      SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;

      The following result is returned:

      approx_count_distinct
      -----------------------
                     1000422
    • Execute the following statements to set the hg_experimental_approx_count_distinct_precision parameter to 20 for the current session and calculate the approximate number of rows that have distinct values in the O_CUSTKEY column:

      -- Set the hg_experimental_approx_count_distinct_precision parameter to 20 for the current session.
      SET hg_experimental_approx_count_distinct_precision = 20;
      SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;

      The following result is returned:

       approx_count_distinct
      -----------------------
                      998854

ARRAY_AGG

  • Description: Concatenates the values returned by an expression into an array.

    ARRAY_AGG(expression)
  • Parameter description

    expression: required. The expression in a format other than the following types: JSON, JSONB, TIMETZ, INTERVAL, INET, OID, UUID, and ARRAY.

  • Return value

    A value of the ARRAY type is returned.

  • Examples

    SELECT ARRAY_AGG(c1) FROM example_table;

    The following result is returned:

     array_agg
    -----------
     {2,1}

AVG

  • Description: Calculates the average value of the non-NULL values returned by a BIGINT, FLOAT8, FLOAT4, or INT expression.

    AVG(expression)
  • Parameter description

    expression: required. The BIGINT, FLOAT8, FLOAT4, or INT expression.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT AVG(c7) FROM example_table;

    The following result is returned:

       avg
    ----------
     1.500000

BIT_AND

  • Description: Performs bitwise AND operations on the non-NULL values returned by a BIGINT or an INT expression.

    BIT_AND(expression)
  • Parameter description

    expression: required. The BIGINT or INT expression.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT BIT_AND(c1) FROM example_table;

    The following result is returned:

     bit_and
    ---------
           0

BIT_OR

  • Description: Performs bitwise OR operations on the non-NULL values of a BIGINT or an INT expression.

    BIT_OR(expression)
  • Parameter description

    expression: required. The BIGINT or INT expression.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT BIT_OR(c1) FROM example_table;

    The following result is returned:

     bit_or
    --------
          3

BOOL_AND

  • Description: Performs AND operations on a BOOLEAN expression to check whether all values returned by the expression are true.

    BOOL_AND(bool)
  • Parameter description

    bool: required. The BOOLEAN expression.

  • Return value

    A value of the BOOLEAN type is returned. This function returns TRUE (t) if all values returned by the expression are true and returns FALSE (f) if one or more values are false.

  • Examples

    SELECT BOOL_AND(c2) FROM example_table;

    The following result is returned:

     bool_and
    ----------
     f

BOOL_OR

  • Description: Performs OR operations on a BOOLEAN expression to check whether one or more values returned by the expression are true.

    BOOL_OR(bool)
  • Parameter description

    bool: required. The BOOLEAN expression.

  • Return value

    A value of the BOOLEAN type is returned. This function returns TRUE (t) if one or more values returned by the expression are true and returns FALSE (f) if no value is true.

  • Examples

    SELECT BOOL_OR(c2) FROM example_table;

    The following result is returned:

     bool_or
    ---------
     t

CORR

  • Description: Calculates the correlation coefficient between two variables.

    CORR(Y, X)
    Note

    Only parameters of the DOUBLE PRECISION, FLOAT, or FLOAT8 type are supported.

  • Parameter description

    Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT CORR(c6, c7) FROM example_table;

    The following result is returned:

     corr
    ------
    

COUNT

  • Description: Calculates the number of rows in a table or the number of rows defined by a BIGINT or NUMERIC expression.

    • Execute the following statement to calculate the number of rows in a table:

      COUNT(*)
    • Execute the following statement to calculate the number of rows defined by a BIGINT or NUMERIC expression:

      COUNT(expression)

      Parameter description: required. This function ignores NULL values.

  • Return value

    A value of the BIGINT type is returned.

  • Examples

    SELECT COUNT(*) FROM example_table;

    The following result is returned:

     count
    -------
         2

COVAR_POP

  • Description: Calculates the population covariance between two variables.

    COVAR_POP(Y, X)
  • Parameter description

    Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT COVAR_POP(c6, c7) FROM example_table;

    The following result is returned:

          covar_pop
    ---------------------
     3.5527136788005e-15

COVAR_SAMP

  • Description: Calculates the sample covariance between two variables.

    COVAR_SAMP(Y, X)
  • Parameter description

    Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT COVAR_SAMP(c6, c7) FROM example_table;

    The following result is returned:

         covar_samp
    --------------------
     7.105427357601e-15

EVERY

  • Description: Performs AND operations on a BOOLEAN expression to check whether all values returned by the expression are true.

    EVERY(bool)
  • Parameter description

    bool: required. The BOOLEAN expression.

  • Return value

    A value of the BOOLEAN type is returned. This function returns TRUE (t) if all values returned by the expression are true and returns FALSE (f) if one or more values are false.

  • Examples

    SELECT EVERY(c2) FROM example_table;

    The following result is returned:

     every
    -------
     f

MAX

  • Description: Calculates the maximum value of a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

    MAX(expression)
  • Parameter description

    expression: required. The BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT MAX(c1) FROM example_table;

    The following result is returned:

     max
    -----
       2

MAX_BY

  • Description: Finds the row in which the maximum value of a column (column y) is included and returns the value of another column (column x) in the same row.

    MAX_BY(x, y);
    Note

    Values of the numeric type are sorted based on the numeric values, and values of non-numeric types are sorted in alphabetical order.

  • Parameter description

    • y: required. The name of a column whose maximum value you want to obtain. NULL values in column y are not involved in the calculation.

    • x: required. The name of another column whose value you want to query.

  • Usage notes

    Only Hologres V1.3.36 and later support the MAX_BY function.

    Note

    If the version of your Hologres instance is earlier than V1.3.36, join the Hologres DingTalk group to apply for an upgrade. For more information, see Obtain online support for Hologres.

  • Return value

    • If the maximum value of column y is found in only one row, the value of column x in the same row is returned.

    • If all values in column y are NULL, NULL is returned.

    • If the maximum value of column y is found in multiple rows, the maximum value of the values in column x in the same rows is returned.

  • Examples

    -- Sample data
    DROP TABLE IF EXISTS test;
    CREATE TABLE IF NOT EXISTS test (
        id INT,
        name TEXT,
        cost INT
    );
    INSERT INTO test
        VALUES (1, 'a', 100), (1, 'aa', 200), (1, 'aaa', 300), (2, 'b', 150), (2, 'bb', 300), (3, 'c', 150), (3, 'cc', 50);

    Execute the following statement to find the rows in which the maximum value of the cost column is included and return the maximum value of the values in the name column in the same rows:

    SELECT max_by(name, cost) FROM test;

    The following result is returned:

     max_by
    --------
     aaa

MIN

  • Description: Calculates the minimum value of a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

    MIN(expression)
  • Parameter description

    expression: required. The BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT MIN(c1) FROM example_table;

    The following result is returned:

     min
    -----
       1

MIN_BY

  • Description: Finds the row in which the minimum value of a column (column y) is included and returns the value of another column (column x) in the same row.

    MIX_BY(x, y);
    Note

    Values of the numeric type are sorted based on the numeric values, and values of non-numeric types are sorted in alphabetical order.

  • Parameter description

    • y: required. The name of a column whose minimum value you want to obtain. NULL values in column y are not involved in the calculation.

    • x: required. The name of another column whose value you want to query.

  • Usage notes

    Only Hologres V1.3.36 and later support the MAX_BY and MIN_BY functions.

    Note

    If the version of your Hologres instance is earlier than V1.3.36, join the Hologres DingTalk group to apply for an upgrade. For more information, see Obtain online support for Hologres.

  • Return value

    • If the minimum value of column y is found in only one row, the value of column x in the same row is returned.

    • If all values in column y are NULL, NULL is returned.

    • If the minimum value of column y is found in multiple rows, the maximum value of the values in column x in the same rows is returned.

  • Examples

    -- Sample data
    DROP TABLE IF EXISTS test;
    CREATE TABLE IF NOT EXISTS test (
        id INT,
        name TEXT,
        cost INT
    );
    INSERT INTO test
        VALUES (1, 'a', 100), (1, 'aa', 200), (1, 'aaa', 300), (2, 'b', 150), (2, 'bb', 300), (3, 'c', 150), (3, 'cc', 50);

    Execute the following statement to find the row in which the minimum value of the cost column is included and return the value of the name column in the same row:

    SELECT MIX_BY(name, cost) FROM test;

    The following result is returned:

    min_by
    --------
     cc
    (1 row)

REGR_AVGX

  • Description: Calculates the average value of an independent variable (x).

    REGR_AVGX(Y, X)
  • Parameter description

    Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT REGR_AVGX(c6, c7) FROM example_table;

    The following result is returned:

     regr_avgx
    -----------
           4.8

REGR_AVGY

  • Description: Calculates the average value of a dependent variable (y).

    REGR_AVGY(Y, X)
  • Parameter description

    Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT REGR_AVGY(c6, c7) FROM example_table;

    The following result is returned:

     regr_avgy
    -----------
             6

REGR_COUNT

  • Description: Calculates the number of rows in which the values of both input parameters are not NULL.

    REGR_COUNT(Y, X)
  • Parameter description

    Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    A value of the BIGINT type is returned.

  • Examples

    SELECT REGR_COUNT(c6, c7) FROM example_table;

    The following result is returned:

     regr_count
    ------------
              2

REGR_R2

  • Description: Calculates the square of the correlation coefficient.

    REGR_R2(Y, X)
  • Parameter description

    Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT REGR_R2(c6, c7) FROM example_table;

    The following result is returned:

     regr_r2
    ---------
           1

REGR_INTERCEPT

  • Description: Calculates the intercept on the y-axis of the linear regression line that best fits the dependent and independent variables.

    REGR_INTERCEPT(Y, X)
  • Parameter description

    Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT REGR_INTERCEPT(c6, c7) FROM example_table;

    The following result is returned:

     regr_intercept
    ----------------
                  6

REGR_SLOPE

  • Description: Calculates the slope of the linear regression line that best fits the dependent and independent variables.

    REGR_SLOPE(Y, X)
  • Parameter description

    Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT REGR_SLOPE(c6, c7) FROM example_table;

    The following result is returned:

          regr_slope
    ----------------------
     7.34031751818285e-16

REGR_SXX

  • Description: Calculates the sum of the squares of an independent variable (x) by using the following formula: sum(X^2) - sum(X)^2/N.

    REGR_SXX(Y, X)
  • Parameter description

    Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT REGR_SXX(c6, c7) FROM example_table;

    The following result is returned:

     regr_sxx
    ----------
         9.68

REGR_SXY

  • Description: Calculates the sum of the products of a dependent variable (x) and an independent variable (y) by using the following formula: sum(X*Y) - sum(X) * sum(Y)/N.

    REGR_SXY(Y, X)
  • Parameter description

    Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT REGR_SXY(c6, c7) FROM example_table;

    The following result is returned:

          regr_sxy
    --------------------
     7.105427357601e-15

SUM

  • Description: Calculates the sum of the values returned by a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

    SUM(expression)
  • Parameter description

    expression: required. The BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

  • Return value

    A value of the data type that is consistent with the input parameter type is returned.

  • Examples

    SELECT SUM(c1) FROM example_table;

    The following result is returned:

     sum
    -----
       3

STDDEV

  • Description: Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.

    STDDEV(expression)
  • Parameter description

    expression: required. The INT, NUMERIC, or FLOAT8 expression.

  • Return value

    A value of the NUMERIC or FLOAT8 type is returned.

  • Examples

    SELECT STDDEV(c1) FROM example_table;

    The following result is returned:

        stddev
    --------------
     0.7071067811

STDDEV_POP

  • Description: Calculates the population standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.

    STDDEV_POP(expression)	
  • Parameter description

    expression: required. The INT, NUMERIC, or FLOAT8 expression.

  • Return value

    A value of the NUMERIC or FLOAT8 type is returned.

  • Examples

    SELECT STDDEV_POP(c1) FROM example_table;

    The following result is returned:

      stddev_pop
    --------------
     0.5000000000

STDDEV_SAMP

  • Description: Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.

    STDDEV_SAMP(expression)	
  • Parameter description

    expression: required. The INT, NUMERIC, or FLOAT8 expression.

  • Return value

    A value of the NUMERIC or FLOAT8 type is returned.

  • Examples

    SELECT STDDEV_SAMP(c1) FROM example_table;

    The following result is returned:

     stddev_samp
    --------------
     0.7071067812

STRING_AGG

  • Description: Concatenates the non-NULL values of a TEXT expression into a string by using a delimiter.

    STRING_AGG(<expression> TEXT, <delimiter> TEXT)
  • Parameter description

    • expression: required. The original string.

    • delimiter: required. The delimiter that you want to use to split the string.

  • Return value

    A value of the TEXT type is returned.

  • Examples

    SELECT STRING_AGG(c5, '-') FROM example_table;

    The following result is returned:

     string_agg
    ------------
     3-4

UNIQ

  • Description: Calculates the number of rows that have distinct values in a column.

    UNIQ ( < column > );
  • Parameter description

    column: required. The column for which you want to calculate the number of rows that have distinct values. The following data types are supported: SMALLINT, INTEGER, BIGINT, REAL, DOUBLE PRECISION, TEXT, VARCHAR, TIMESTAMP, TIMESTAMPTZ, DATE, TIMETZ, and UUID.

  • Usage notes

    • Only Hologres V1.3 and later support the UNIQ function.

      Note

      If the version of your Hologres instance is earlier than V1.3, you can manually upgrade your Hologres instance in the Hologres console or join the Hologres DingTalk group to contact Hologres technical support. For more information about how to manually upgrade your Hologres instance in the Hologres console, see Manual upgrade (beta). For more information about how to obtain technical support, see Obtain online support for Hologres.

    • The UNIQ function delivers better performance than the COUNT DISTINCT function only if the SQL statement contains the GROUP BY clause and values of the field based on which the GROUP BY operation is performed are evenly distributed.

    • In most cases, the UNIQ function has better performance than the COUNT DISTINCT function if the GROUP BY key has high cardinality. In addition, the UNIQ function occupies less memory space than the COUNT DISTINCT function. If the memory space is insufficient for using the COUNT DISTINCT function, you can use the UNIQ function instead.

    • The performance of the COUNT DISTINCT function is optimized in Hologres V2.1 and later for various scenarios, such as scenarios where one or more COUNT DISTINCT functions are used, data skew occurs, or no GROUP BY clause is used. In these scenarios in Hologres V2.1 or later, you do not need to use the UNIQ function because the COUNT DISTINCT function delivers high performance. For more information, see the "Optimize the COUNT DISTINCT function" section in Optimize performance of queries on Hologres internal tables.

  • Examples

    In the following examples, the ORDERS table in the public TPC-H dataset of MaxCompute is used. You need to only execute the related SQL statements. No additional preparations are required.

    -- Create a foreign table.
    IMPORT FOREIGN SCHEMA public_data
    LIMIT TO (ORDERS_10g)
    FROM SERVER odps_server INTO public options (if_table_exist 'update');
    
    -- Create an internal table.
    BEGIN;
    CREATE TABLE ORDERS (
        O_ORDERKEY bigint NOT NULL PRIMARY KEY,
        O_CUSTKEY int NOT NULL,
        O_ORDERSTATUS text NOT NULL,
        O_TOTALPRICE DECIMAL(12, 2) NOT NULL,
        O_ORDERDATE timestamptz NOT NULL,
        O_ORDERPRIORITY text NOT NULL,
        O_CLERK text NOT NULL,
        O_SHIPPRIORITY int NOT NULL,
        O_COMMENT text NOT NULL
    );
    
    CALL set_table_property ('ORDERS', 'segment_key', 'O_ORDERDATE');
    CALL set_table_property ('ORDERS', 'distribution_key', 'O_ORDERKEY');
    CALL set_table_property ('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
    CALL set_table_property ('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');
    
    COMMENT ON COLUMN ORDERS.O_ORDERKEY IS 'Order ID';
    COMMENT ON COLUMN ORDERS.O_CUSTKEY IS 'Customer ID';
    COMMENT ON COLUMN ORDERS.O_ORDERSTATUS IS 'Order status';
    COMMENT ON COLUMN ORDERS.O_TOTALPRICE IS 'Total price';
    COMMENT ON COLUMN ORDERS.O_ORDERDATE IS 'Order date';
    COMMENT ON COLUMN ORDERS.O_ORDERPRIORITY IS 'Order priority';
    COMMENT ON COLUMN ORDERS.O_CLERK IS 'Cashier';
    COMMENT ON COLUMN ORDERS.O_SHIPPRIORITY IS 'Shipping priority';
    COMMENT ON COLUMN ORDERS.O_COMMENT IS 'Comment';
    
    COMMIT;
    
    -- Import data to the internal table.
    INSERT INTO ORDERS SELECT * FROM ORDERS_10g;
    • Execute the following statement to calculate the number of rows that have distinct values in the O_CUSTKEY column of the ORDERS table:

      SELECT UNIQ ( O_CUSTKEY ) FROM ORDERS;

      The following result is returned:

        uniq
      --------
       999982
    • Execute the following statement to group data in the ORDERS table by O_ORDERSTATUS and calculate the number of rows that have distinct values in the O_CUSTKEY column in each order state:

      SELECT O_ORDERSTATUS, UNIQ ( O_CUSTKEY ) FROM ORDERS GROUP BY O_ORDERSTATUS;

      The following result is returned:

       o_orderstatus |  uniq
      ---------------+--------
       P             | 313478
       F             | 996258
       O             | 996400

VARIANCE

  • Description: Calculates the sample variance of the values returned by an INT or a NUMERIC expression.

    VARIANCE(expression)
  • Parameter description

    expression: required. The INT or NUMERIC expression.

  • Return value

    A value of the NUMERIC type is returned.

  • Examples

    SELECT VARIANCE(c1) FROM example_table;

    The following result is returned:

       variance
    --------------
     0.5000000000

VAR_POP

  • Description: Calculates the population variance of the values returned by a FLOAT8, INT, or NUMERIC expression.

    VAR_POP(expression)
  • Parameter description

    expression: required. The FLOAT8, INT, or NUMERIC expression.

  • Return value

    A value of the NUMERIC or FLOAT8 type is returned.

  • Examples

    SELECT VAR_POP(c1) FROM example_table;

    The following result is returned:

       var_pop
    --------------
     0.2500000000

VAR_SAMP

  • Description: Calculates the sample variance of the values returned by an INT, FLOAT8 or NUMERIC expression.

    VAR_SAMP(expression)
  • Parameter description

    expression: required. The INT, FLOAT8, or NUMERIC expression.

  • Return value

    A value of the NUMERIC or FLOAT8 type is returned.

  • Examples

    SELECT VAR_SAMP(c1) FROM example_table;

    The following result is returned:

       var_samp
    --------------
     0.5000000000