Hologres is compatible with PostgreSQL and allows you to use the standard PostgreSQL syntax for data development. This topic describes the general-purpose aggregate functions supported by Hologres.
The functions supported by Hologres are a subset of the PostgreSQL functions. For more information about how to use these functions, see Aggregate Functions in the PostgreSQL documentation. The following table describes the general-purpose aggregate functions supported by Hologres.
Function | Description |
Calculates the number of rows that have distinct values in a column. The result of this function is an approximate value. | |
Aggregates the values of an expression into an array. | |
Calculates the average value of the non-NULL values returned by a BIGINT, FLOAT8, FLOAT4, or INT expression. | |
Performs bitwise AND operations on the non-NULL values returned by a BIGINT or an INT expression. | |
Performs bitwise OR operations on the non-NULL values returned by a BIGINT or an INT expression. | |
Performs AND operations on a BOOLEAN expression to check whether all values returned by the expression are true. | |
Performs OR operations on a BOOLEAN expression to check whether one or more values returned by the expression are true. | |
Calculates the correlation coefficient between two variables. | |
Calculates the number of rows in a table or the number of rows defined by a BIGINT or NUMERIC expression. | |
Calculates the population covariance between two variables. | |
Calculates the sample covariance between two variables. | |
Performs AND operations on a BOOLEAN expression to check whether all values returned by the expression are true. | |
Calculates the maximum value returned by a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression. | |
Finds the row in which the maximum value of a column (column y) is included and returns the value of another column (column x) in the same row. | |
Calculates the minimum value returned by a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression. | |
Finds the row in which the minimum value of a column (column y) is included and returns the value of another column (column x) in the same row. | |
Calculates the average value of an independent variable (x). | |
Calculates the average value of a dependent variable (y). | |
Calculates the number of rows in which the values of both input parameters are not NULL. | |
Calculates the square of the correlation coefficient. | |
Calculates the intercept on the y-axis of the linear regression line that best fits the dependent and independent variables. | |
Calculates the slope of the linear regression line that best fits the dependent and independent variables. | |
Calculates the sum of the squares of an independent variable (x) by using the following formula: | |
Calculates the sum of the products of a dependent variable (x) and an independent variable (y) by using the following formula: | |
Calculates the sum of the values returned by a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression. | |
Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression. | |
Calculates the population standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression. | |
Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression. | |
Concatenates the non-NULL values returned by a TEXT expression into a string by using a delimiter. | |
Calculates the number of rows that have distinct values in a column. | |
Calculates the sample variance of the values returned by an INT or a NUMERIC expression. | |
Calculates the population variance of the values returned by a FLOAT8, INT, or NUMERIC expression. | |
Calculates the sample variance of the values returned by an INT, FLOAT8 or NUMERIC expression. |
Sample data
-- Create a table.
CREATE TABLE example_table(
c1 INT,
c2 BOOLEAN,
c3 VARCHAR(10),
c4 DECIMAL(3, 1),
c5 TEXT,
c6 FLOAT8,
c7 FLOAT8
);
-- Insert data into the table.
INSERT INTO example_table (c1, c2, c3, c4, c5, c6, C7) VALUES
(1, true, 'a', 1.1, 3, 6, 7),
(2, false, 'b', 2.2, 4, 6, 2.6);
APPROX_COUNT_DISTINCT
Description: Calculates the number of rows that have distinct values in a column. The result of this function is an approximate value.
APPROX_COUNT_DISTINCT ( <column> )
Parameter description
column: required. The column in which you want to calculate the approximate number of rows that have distinct values.
Usage notes
The
APPROX_COUNT_DISTINCT
function uses the HyperLogLog algorithm to perform an inaccurate COUNT DISTINCT operation. An inaccurate COUNT DISTINCT operation helps improve query performance especially when a large number of discrete values exist in the column. This ensures that the average margin of error is in the range of0.1% to 1%
. This function is suitable for scenarios in which high query performance is required and an approximate result is acceptable. To adjust the margin of error, you can execute the following statement to change the value of the hg_experimental_approx_count_distinct_precision parameter:SET hg_experimental_approx_count_distinct_precision = 20;
Valid values: [12,20]. Default value: 17.
This parameter specifies the number of bits used as the index of buckets in the HyperLogLog algorithm. A greater value indicates more buckets and higher accuracy.
A greater value indicates that more computing time and memory are consumed. However, the overhead is still much smaller than the overhead of the
COUNT DISTINCT (column)
function. Therefore, we recommend that you use theAPPROX_COUNT_DISTINCT
function rather than theCOUNT DISTINCT ( column )
function.If you set this parameter to a value greater than 17, Hologres uses the HyperLogLog++ algorithm to correct the error of the returned result in order to reduce the margin of error.
For example, if you set the hg_experimental_approx_count_distinct_precision parameter to 20, the margin of error can be reduced to 0.01% to 0.2% in most cases.
NoteYou can also perform an accurate COUNT DISTINCT operation by using the
COUNT DISTINCT (column)
function, which consumes more resources.Examples
In the following examples, the ORDERS table in the public TPC-H dataset of MaxCompute is used. You need to only execute the related SQL statements. No additional preparations are required.
-- Create a foreign table. IMPORT FOREIGN SCHEMA public_data LIMIT TO (ORDERS_10g) FROM SERVER odps_server INTO public options (if_table_exist 'update'); -- Create an internal table. BEGIN; CREATE TABLE ORDERS ( O_ORDERKEY bigint NOT NULL PRIMARY KEY, O_CUSTKEY int NOT NULL, O_ORDERSTATUS text NOT NULL, O_TOTALPRICE DECIMAL(12, 2) NOT NULL, O_ORDERDATE timestamptz NOT NULL, O_ORDERPRIORITY text NOT NULL, O_CLERK text NOT NULL, O_SHIPPRIORITY int NOT NULL, O_COMMENT text NOT NULL ); CALL set_table_property ('ORDERS', 'segment_key', 'O_ORDERDATE'); CALL set_table_property ('ORDERS', 'distribution_key', 'O_ORDERKEY'); CALL set_table_property ('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT'); CALL set_table_property ('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT'); COMMENT ON COLUMN ORDERS.O_ORDERKEY IS 'Order ID'; COMMENT ON COLUMN ORDERS.O_CUSTKEY IS 'Customer ID'; COMMENT ON COLUMN ORDERS.O_ORDERSTATUS IS 'Order status'; COMMENT ON COLUMN ORDERS.O_TOTALPRICE IS 'Total price'; COMMENT ON COLUMN ORDERS.O_ORDERDATE IS 'Order date'; COMMENT ON COLUMN ORDERS.O_ORDERPRIORITY IS 'Order priority'; COMMENT ON COLUMN ORDERS.O_CLERK IS 'Cashier'; COMMENT ON COLUMN ORDERS.O_SHIPPRIORITY IS 'Shipping priority'; COMMENT ON COLUMN ORDERS.O_COMMENT IS 'Comment'; COMMIT; -- Import data to the internal table. INSERT INTO ORDERS SELECT * FROM ORDERS_10g;
Execute the following statement to calculate the approximate number of rows that have distinct values in the O_CUSTKEY column:
SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;
The following result is returned:
approx_count_distinct ----------------------- 1000422
Execute the following statements to set the hg_experimental_approx_count_distinct_precision parameter to 20 for all sessions and calculate the approximate number of rows that have distinct values in the O_CUSTKEY column:
ALTER DATABASE dbname SET hg_experimental_approx_count_distinct_precision = 20; SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;
The following result is returned:
approx_count_distinct ----------------------- 1000422
Execute the following statements to set the hg_experimental_approx_count_distinct_precision parameter to 20 for the current session and calculate the approximate number of rows that have distinct values in the O_CUSTKEY column:
-- Set the hg_experimental_approx_count_distinct_precision parameter to 20 for the current session. SET hg_experimental_approx_count_distinct_precision = 20; SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;
The following result is returned:
approx_count_distinct ----------------------- 998854
ARRAY_AGG
Description: Concatenates the values returned by an expression into an array.
ARRAY_AGG(expression)
Parameter description
expression: required. The expression in a format other than the following types: JSON, JSONB, TIMETZ, INTERVAL, INET, OID, UUID, and ARRAY.
Return value
A value of the ARRAY type is returned.
Examples
SELECT ARRAY_AGG(c1) FROM example_table;
The following result is returned:
array_agg ----------- {2,1}
AVG
Description: Calculates the average value of the non-NULL values returned by a BIGINT, FLOAT8, FLOAT4, or INT expression.
AVG(expression)
Parameter description
expression: required. The BIGINT, FLOAT8, FLOAT4, or INT expression.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT AVG(c7) FROM example_table;
The following result is returned:
avg ---------- 1.500000
BIT_AND
Description: Performs bitwise AND operations on the non-NULL values returned by a BIGINT or an INT expression.
BIT_AND(expression)
Parameter description
expression: required. The BIGINT or INT expression.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT BIT_AND(c1) FROM example_table;
The following result is returned:
bit_and --------- 0
BIT_OR
Description: Performs bitwise OR operations on the non-NULL values of a BIGINT or an INT expression.
BIT_OR(expression)
Parameter description
expression: required. The BIGINT or INT expression.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT BIT_OR(c1) FROM example_table;
The following result is returned:
bit_or -------- 3
BOOL_AND
Description: Performs AND operations on a BOOLEAN expression to check whether all values returned by the expression are true.
BOOL_AND(bool)
Parameter description
bool: required. The BOOLEAN expression.
Return value
A value of the BOOLEAN type is returned. This function returns TRUE (t) if all values returned by the expression are true and returns FALSE (f) if one or more values are false.
Examples
SELECT BOOL_AND(c2) FROM example_table;
The following result is returned:
bool_and ---------- f
BOOL_OR
Description: Performs OR operations on a BOOLEAN expression to check whether one or more values returned by the expression are true.
BOOL_OR(bool)
Parameter description
bool: required. The BOOLEAN expression.
Return value
A value of the BOOLEAN type is returned. This function returns TRUE (t) if one or more values returned by the expression are true and returns FALSE (f) if no value is true.
Examples
SELECT BOOL_OR(c2) FROM example_table;
The following result is returned:
bool_or --------- t
CORR
Description: Calculates the correlation coefficient between two variables.
CORR(Y, X)
NoteOnly parameters of the DOUBLE PRECISION, FLOAT, or FLOAT8 type are supported.
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT CORR(c6, c7) FROM example_table;
The following result is returned:
corr ------
COUNT
Description: Calculates the number of rows in a table or the number of rows defined by a BIGINT or NUMERIC expression.
Execute the following statement to calculate the number of rows in a table:
COUNT(*)
Execute the following statement to calculate the number of rows defined by a BIGINT or NUMERIC expression:
COUNT(expression)
Parameter description: required. This function ignores NULL values.
Return value
A value of the BIGINT type is returned.
Examples
SELECT COUNT(*) FROM example_table;
The following result is returned:
count ------- 2
COVAR_POP
Description: Calculates the population covariance between two variables.
COVAR_POP(Y, X)
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT COVAR_POP(c6, c7) FROM example_table;
The following result is returned:
covar_pop --------------------- 3.5527136788005e-15
COVAR_SAMP
Description: Calculates the sample covariance between two variables.
COVAR_SAMP(Y, X)
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT COVAR_SAMP(c6, c7) FROM example_table;
The following result is returned:
covar_samp -------------------- 7.105427357601e-15
EVERY
Description: Performs AND operations on a BOOLEAN expression to check whether all values returned by the expression are true.
EVERY(bool)
Parameter description
bool: required. The BOOLEAN expression.
Return value
A value of the BOOLEAN type is returned. This function returns TRUE (t) if all values returned by the expression are true and returns FALSE (f) if one or more values are false.
Examples
SELECT EVERY(c2) FROM example_table;
The following result is returned:
every ------- f
MAX
Description: Calculates the maximum value of a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
MAX(expression)
Parameter description
expression: required. The BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT MAX(c1) FROM example_table;
The following result is returned:
max ----- 2
MAX_BY
Description: Finds the row in which the maximum value of a column (column y) is included and returns the value of another column (column x) in the same row.
MAX_BY(x, y);
NoteValues of the numeric type are sorted based on the numeric values, and values of non-numeric types are sorted in alphabetical order.
Parameter description
y: required. The name of a column whose maximum value you want to obtain. NULL values in column y are not involved in the calculation.
x: required. The name of another column whose value you want to query.
Usage notes
Only Hologres V1.3.36 and later support the MAX_BY function.
NoteIf the version of your Hologres instance is earlier than V1.3.36, join the Hologres DingTalk group to apply for an upgrade. For more information, see Obtain online support for Hologres.
Return value
If the maximum value of column y is found in only one row, the value of column x in the same row is returned.
If all values in column y are NULL, NULL is returned.
If the maximum value of column y is found in multiple rows, the maximum value of the values in column x in the same rows is returned.
Examples
-- Sample data DROP TABLE IF EXISTS test; CREATE TABLE IF NOT EXISTS test ( id INT, name TEXT, cost INT ); INSERT INTO test VALUES (1, 'a', 100), (1, 'aa', 200), (1, 'aaa', 300), (2, 'b', 150), (2, 'bb', 300), (3, 'c', 150), (3, 'cc', 50);
Execute the following statement to find the rows in which the maximum value of the cost column is included and return the maximum value of the values in the name column in the same rows:
SELECT max_by(name, cost) FROM test;
The following result is returned:
max_by -------- aaa
MIN
Description: Calculates the minimum value of a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
MIN(expression)
Parameter description
expression: required. The BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT MIN(c1) FROM example_table;
The following result is returned:
min ----- 1
MIN_BY
Description: Finds the row in which the minimum value of a column (column y) is included and returns the value of another column (column x) in the same row.
MIX_BY(x, y);
NoteValues of the numeric type are sorted based on the numeric values, and values of non-numeric types are sorted in alphabetical order.
Parameter description
y: required. The name of a column whose minimum value you want to obtain. NULL values in column y are not involved in the calculation.
x: required. The name of another column whose value you want to query.
Usage notes
Only Hologres V1.3.36 and later support the MAX_BY and MIN_BY functions.
NoteIf the version of your Hologres instance is earlier than V1.3.36, join the Hologres DingTalk group to apply for an upgrade. For more information, see Obtain online support for Hologres.
Return value
If the minimum value of column y is found in only one row, the value of column x in the same row is returned.
If all values in column y are NULL, NULL is returned.
If the minimum value of column y is found in multiple rows, the maximum value of the values in column x in the same rows is returned.
Examples
-- Sample data DROP TABLE IF EXISTS test; CREATE TABLE IF NOT EXISTS test ( id INT, name TEXT, cost INT ); INSERT INTO test VALUES (1, 'a', 100), (1, 'aa', 200), (1, 'aaa', 300), (2, 'b', 150), (2, 'bb', 300), (3, 'c', 150), (3, 'cc', 50);
Execute the following statement to find the row in which the minimum value of the cost column is included and return the value of the name column in the same row:
SELECT MIX_BY(name, cost) FROM test;
The following result is returned:
min_by -------- cc (1 row)
REGR_AVGX
Description: Calculates the average value of an independent variable (x).
REGR_AVGX(Y, X)
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT REGR_AVGX(c6, c7) FROM example_table;
The following result is returned:
regr_avgx ----------- 4.8
REGR_AVGY
Description: Calculates the average value of a dependent variable (y).
REGR_AVGY(Y, X)
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT REGR_AVGY(c6, c7) FROM example_table;
The following result is returned:
regr_avgy ----------- 6
REGR_COUNT
Description: Calculates the number of rows in which the values of both input parameters are not NULL.
REGR_COUNT(Y, X)
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the BIGINT type is returned.
Examples
SELECT REGR_COUNT(c6, c7) FROM example_table;
The following result is returned:
regr_count ------------ 2
REGR_R2
Description: Calculates the square of the correlation coefficient.
REGR_R2(Y, X)
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT REGR_R2(c6, c7) FROM example_table;
The following result is returned:
regr_r2 --------- 1
REGR_INTERCEPT
Description: Calculates the intercept on the y-axis of the linear regression line that best fits the dependent and independent variables.
REGR_INTERCEPT(Y, X)
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT REGR_INTERCEPT(c6, c7) FROM example_table;
The following result is returned:
regr_intercept ---------------- 6
REGR_SLOPE
Description: Calculates the slope of the linear regression line that best fits the dependent and independent variables.
REGR_SLOPE(Y, X)
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT REGR_SLOPE(c6, c7) FROM example_table;
The following result is returned:
regr_slope ---------------------- 7.34031751818285e-16
REGR_SXX
Description: Calculates the sum of the squares of an independent variable (x) by using the following formula:
sum(X^2) - sum(X)^2/N
.REGR_SXX(Y, X)
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT REGR_SXX(c6, c7) FROM example_table;
The following result is returned:
regr_sxx ---------- 9.68
REGR_SXY
Description: Calculates the sum of the products of a dependent variable (x) and an independent variable (y) by using the following formula:
sum(X*Y) - sum(X) * sum(Y)/N
.REGR_SXY(Y, X)
Parameter description
Y and X: required. The DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT REGR_SXY(c6, c7) FROM example_table;
The following result is returned:
regr_sxy -------------------- 7.105427357601e-15
SUM
Description: Calculates the sum of the values returned by a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
SUM(expression)
Parameter description
expression: required. The BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.
Return value
A value of the data type that is consistent with the input parameter type is returned.
Examples
SELECT SUM(c1) FROM example_table;
The following result is returned:
sum ----- 3
STDDEV
Description: Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
STDDEV(expression)
Parameter description
expression: required. The INT, NUMERIC, or FLOAT8 expression.
Return value
A value of the NUMERIC or FLOAT8 type is returned.
Examples
SELECT STDDEV(c1) FROM example_table;
The following result is returned:
stddev -------------- 0.7071067811
STDDEV_POP
Description: Calculates the population standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
STDDEV_POP(expression)
Parameter description
expression: required. The INT, NUMERIC, or FLOAT8 expression.
Return value
A value of the NUMERIC or FLOAT8 type is returned.
Examples
SELECT STDDEV_POP(c1) FROM example_table;
The following result is returned:
stddev_pop -------------- 0.5000000000
STDDEV_SAMP
Description: Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.
STDDEV_SAMP(expression)
Parameter description
expression: required. The INT, NUMERIC, or FLOAT8 expression.
Return value
A value of the NUMERIC or FLOAT8 type is returned.
Examples
SELECT STDDEV_SAMP(c1) FROM example_table;
The following result is returned:
stddev_samp -------------- 0.7071067812
STRING_AGG
Description: Concatenates the non-NULL values of a TEXT expression into a string by using a delimiter.
STRING_AGG(<expression> TEXT, <delimiter> TEXT)
Parameter description
expression: required. The original string.
delimiter: required. The delimiter that you want to use to split the string.
Return value
A value of the TEXT type is returned.
Examples
SELECT STRING_AGG(c5, '-') FROM example_table;
The following result is returned:
string_agg ------------ 3-4
UNIQ
Description: Calculates the number of rows that have distinct values in a column.
UNIQ ( < column > );
Parameter description
column: required. The column for which you want to calculate the number of rows that have distinct values. The following data types are supported: SMALLINT, INTEGER, BIGINT, REAL, DOUBLE PRECISION, TEXT, VARCHAR, TIMESTAMP, TIMESTAMPTZ, DATE, TIMETZ, and UUID.
Usage notes
Only Hologres V1.3 and later support the UNIQ function.
NoteIf the version of your Hologres instance is earlier than V1.3, you can manually upgrade your Hologres instance in the Hologres console or join the Hologres DingTalk group to contact Hologres technical support. For more information about how to manually upgrade your Hologres instance in the Hologres console, see Manual upgrade (beta). For more information about how to obtain technical support, see Obtain online support for Hologres.
The UNIQ function delivers better performance than the COUNT DISTINCT function only if the SQL statement contains the GROUP BY clause and values of the field based on which the GROUP BY operation is performed are evenly distributed.
In most cases, the UNIQ function has better performance than the COUNT DISTINCT function if the GROUP BY key has high cardinality. In addition, the UNIQ function occupies less memory space than the COUNT DISTINCT function. If the memory space is insufficient for using the COUNT DISTINCT function, you can use the UNIQ function instead.
The performance of the COUNT DISTINCT function is optimized in Hologres V2.1 and later for various scenarios, such as scenarios where one or more COUNT DISTINCT functions are used, data skew occurs, or no GROUP BY clause is used. In these scenarios in Hologres V2.1 or later, you do not need to use the UNIQ function because the COUNT DISTINCT function delivers high performance. For more information, see the "Optimize the COUNT DISTINCT function" section in Optimize performance of queries on Hologres internal tables.
Examples
In the following examples, the ORDERS table in the public TPC-H dataset of MaxCompute is used. You need to only execute the related SQL statements. No additional preparations are required.
-- Create a foreign table. IMPORT FOREIGN SCHEMA public_data LIMIT TO (ORDERS_10g) FROM SERVER odps_server INTO public options (if_table_exist 'update'); -- Create an internal table. BEGIN; CREATE TABLE ORDERS ( O_ORDERKEY bigint NOT NULL PRIMARY KEY, O_CUSTKEY int NOT NULL, O_ORDERSTATUS text NOT NULL, O_TOTALPRICE DECIMAL(12, 2) NOT NULL, O_ORDERDATE timestamptz NOT NULL, O_ORDERPRIORITY text NOT NULL, O_CLERK text NOT NULL, O_SHIPPRIORITY int NOT NULL, O_COMMENT text NOT NULL ); CALL set_table_property ('ORDERS', 'segment_key', 'O_ORDERDATE'); CALL set_table_property ('ORDERS', 'distribution_key', 'O_ORDERKEY'); CALL set_table_property ('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT'); CALL set_table_property ('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT'); COMMENT ON COLUMN ORDERS.O_ORDERKEY IS 'Order ID'; COMMENT ON COLUMN ORDERS.O_CUSTKEY IS 'Customer ID'; COMMENT ON COLUMN ORDERS.O_ORDERSTATUS IS 'Order status'; COMMENT ON COLUMN ORDERS.O_TOTALPRICE IS 'Total price'; COMMENT ON COLUMN ORDERS.O_ORDERDATE IS 'Order date'; COMMENT ON COLUMN ORDERS.O_ORDERPRIORITY IS 'Order priority'; COMMENT ON COLUMN ORDERS.O_CLERK IS 'Cashier'; COMMENT ON COLUMN ORDERS.O_SHIPPRIORITY IS 'Shipping priority'; COMMENT ON COLUMN ORDERS.O_COMMENT IS 'Comment'; COMMIT; -- Import data to the internal table. INSERT INTO ORDERS SELECT * FROM ORDERS_10g;
Execute the following statement to calculate the number of rows that have distinct values in the O_CUSTKEY column of the ORDERS table:
SELECT UNIQ ( O_CUSTKEY ) FROM ORDERS;
The following result is returned:
uniq -------- 999982
Execute the following statement to group data in the ORDERS table by O_ORDERSTATUS and calculate the number of rows that have distinct values in the O_CUSTKEY column in each order state:
SELECT O_ORDERSTATUS, UNIQ ( O_CUSTKEY ) FROM ORDERS GROUP BY O_ORDERSTATUS;
The following result is returned:
o_orderstatus | uniq ---------------+-------- P | 313478 F | 996258 O | 996400
VARIANCE
Description: Calculates the sample variance of the values returned by an INT or a NUMERIC expression.
VARIANCE(expression)
Parameter description
expression: required. The INT or NUMERIC expression.
Return value
A value of the NUMERIC type is returned.
Examples
SELECT VARIANCE(c1) FROM example_table;
The following result is returned:
variance -------------- 0.5000000000
VAR_POP
Description: Calculates the population variance of the values returned by a FLOAT8, INT, or NUMERIC expression.
VAR_POP(expression)
Parameter description
expression: required. The FLOAT8, INT, or NUMERIC expression.
Return value
A value of the NUMERIC or FLOAT8 type is returned.
Examples
SELECT VAR_POP(c1) FROM example_table;
The following result is returned:
var_pop -------------- 0.2500000000
VAR_SAMP
Description: Calculates the sample variance of the values returned by an INT, FLOAT8 or NUMERIC expression.
VAR_SAMP(expression)
Parameter description
expression: required. The INT, FLOAT8, or NUMERIC expression.
Return value
A value of the NUMERIC or FLOAT8 type is returned.
Examples
SELECT VAR_SAMP(c1) FROM example_table;
The following result is returned:
var_samp -------------- 0.5000000000