This topic describes how to use the SELECT statement to query data in Hologres.
Command overview
The SELECT statement queries data from zero or more tables. The main parameters are listed in the following table.
|
Parameters |
|
|
None |
|
Command syntax
The basic syntax of the SELECT statement is as follows.
[ WITH with_query [, ...] ]
SELECT [ALL | DISTINCT [ON (expression [, ...])]]
* | expression [[AS] output_name] [, ...]
[FROM from_item [, ...]]
[WHERE condition]
[GROUP BY grouping_element [, ...]]
[HAVING condition [, ...]]
[{UNION | INTERSECT | EXCEPT} [ALL] select]
[ORDER BY expression [ASC | DESC | USING operator] [, ...]]
[LIMIT {count | ALL}]
In this syntax, grouping_element and from_item are defined as follows:
-
A
grouping_elementcontains anexpression. An expression can be a column name, a constant, a function, or any combination of column names and constants that are connected by arithmetic or bitwise operators. -
from_item contains
table_name [[AS] alias [( column_alias [, ...] )]] (select) [AS] alias [( column_alias [, ...] )] from_item [NATURAL] join_type from_item [ON join_condition | USING ( join_column [, ...] )]
Command usage
The following are common SELECT methods:
-
All elements in the FROM list, which are real or virtual tables, are evaluated. If the list contains more than one element, the result is their union.
-
If a WHERE clause is specified, it filters out all rows that do not satisfy the specified condition.
-
If a
GROUP BYclause is specified or if aggregate functions are used, the rows are combined into groups based on matching values. The aggregate functions are then computed for each group. If a HAVING clause is present, it filters out groups that do not meet the specified condition. -
The SELECT output expressions are used to compute the actual output rows from the selected rows or groups.
-
You can combine multiple SELECT statements into a single result set using the UNION, INTERSECT, and EXCEPT set operators. The UNION operator returns all rows that are in either or both result sets. The INTERSECT operator returns all rows that are in both result sets. The EXCEPT operator returns rows that are in the first result set but not in the second. For all three operators, duplicate rows are removed from the final result set unless you specify ALL. By default, these operators remove duplicates, which is equivalent to specifying DISTINCT. Note that this behavior is different from a standard SELECT statement, where the default behavior is to return all rows, including duplicates.
-
If you specify an
ORDER BYclause, the returned rows are sorted in the specified order. If you do not specify anORDER BYclause, the rows are returned in an arbitrary order, which is typically the fastest order for the system to produce. -
If a LIMIT (or FETCH FIRST) or OFFSET clause is specified, the SELECT statement returns only a subset of the result rows.
Parameter descriptions
-
WITH clause
-
Command overview
The WITH clause lets you define one or more subqueries that can be referenced by name in the main query. These named subqueries are known as Common Table Expressions (CTEs). A WITH clause is typically placed before the primary SELECT statement. The syntax is as follows:
with query_name [ ( column_name [, ...] ) ] AS ( select ) -
Parameter description
Parameter
Description
query_name
Specifies the name of the current CTE. It can be any valid identifier.
column_name
Lists the column names corresponding to the return values of the subquery. This is similar to the AS semantics in a SELECT clause. The subquery can be a regular SELECT query.
CTEs are separated by commas. A later CTE definition can reference an earlier CTE. Recursive CTEs are not currently supported. In subsequent queries, you can use
query_namedirectly as a view. If nocolumn_namelist is specified, the view’scolumn_namedefaults to the column names returned by the subquery. If acolumn_namelist is specified, you must use the definedcolumn_name, and the number ofcolumn_nameentries must match the number of columns returned by the SELECT statement.
-
-
SELECT list
-
Command overview
The SELECT list, which is the part of the statement between the SELECT and FROM keywords, specifies the expressions that form the output rows of the SELECT statement. These expressions can refer to columns that are computed in the FROM clause.
-
Parameter description
Each output column in a SELECT statement has a name. In a simple SELECT statement, this name is used to label the column for display. If the SELECT statement is a subquery, the outer query uses this name to reference the column of the virtual table that is produced by the subquery. To specify a name for an output column, add
AS output_nameafter the column expression. You can omit AS, but only if the desired output name does not match any PostgreSQL keyword. To avoid potential conflicts with future keywords, we recommend that you use AS or enclose the output name in double quotes. If you do not specify a column name, PostgreSQL automatically assigns one. If the column expression is a simple column reference, the assigned name is the same as the column's name. In more complex cases that involve functions or type names, the system may generate a generic name, such as?column?.You can reference an output column by its name in the
ORDER BYandGROUP BYclauses. However, you cannot reference an output column by its name in the WHERE or HAVING clauses. In those clauses, you must use the expression itself.You can use an asterisk (*) in the output list as a shorthand for all columns of the selected rows. You can also use
table_name.*as a shorthand for all columns from a specific table. In these cases, you cannot use AS to specify a new name. The output column names will be the same as the original table column names.
-
-
FROM clause
-
Command overview
The FROM clause specifies one or more source tables for the SELECT statement. If you specify multiple source tables, the result is the Cartesian product (cross join) of all the source tables. Typically, you can add a WHERE clause with join conditions to restrict the returned rows to a small subset of the Cartesian product.
-
Parameter description
The FROM clause can include the following elements:
Element
Description
table_name
The name of an existing table or view (optionally qualified with a schema).
alias
An alternative name for a FROM item that includes an alias. Aliases make writing concise or resolve ambiguity in self-joins (where the same table is scanned multiple times). When an alias is provided, the actual table or function name is hidden. For example, given
FROM foo AS f, the rest of the SELECT must refer to this FROM item as f, not foo.select
A sub-SELECT can appear in the FROM clause. This acts as if its output were a temporary table created for the duration of the SELECT command. Note that the sub-SELECT must be enclosed in parentheses and must have an alias.
function_name
A function call can appear in the FROM clause (especially useful for functions that return result sets, but any function can be used). This acts as if the function's output were a temporary table created for the duration of the SELECT command.
You can provide an alias just like for a table. If you provide an alias, you can also specify a list of column aliases to give alternative names to one or more attributes of the function's composite return type, including any new column added by ORDINALITY (if present).
You can combine multiple function calls into a single FROM-clause item by enclosing them in
ROWS FROM(。). The output of such an item concatenates the first row of each function, then the second row of each function, and so on. If some functions produce fewer rows than others, NULLs fill the missing data positions, so the total number of returned rows always matches the function that produces the most rows.join_type
Includes the following five types:
-
[ INNER ] JOIN
For INNER and OUTER join types, you must specify a join condition: one of
NATURAL, ON join_condition, or USING (join_column [, ...])(only one). See below for meanings. ForCROSS JOIN, these clauses cannot appear. A JOIN clause combines two FROM items (called “tables” for convenience, though they can be any FROM item type). Use parentheses to control nesting order if needed. Without parentheses, JOINs nest left to right. In all cases, JOIN binding is stronger than comma-separated FROM-list items. -
LEFT [ OUTER ] JOIN
LEFT OUTER JOINreturns all rows from the restricted Cartesian product (all combinations that satisfy the join condition), plus one copy of each unmatched row from the left table. Unmatched left rows are extended to full join-table rows by inserting NULLs in right-hand columns. Note that only the JOIN clause’s own condition determines which rows match. Outer conditions are applied afterward. -
RIGHT [ OUTER ] JOIN
RIGHT OUTER JOINreturns all joined rows plus each unmatched right-hand row (extended with NULLs on the left). This is merely notational convenience, as you can convert it to aLEFT OUTER JOINby swapping left and right tables. -
FULL [ OUTER ] JOIN
FULL OUTER JOINreturns all joined rows, plus each unmatched left-hand row (extended with NULLs on the right), plus each unmatched right-hand row (extended with NULLs on the left). -
CROSS JOIN
CROSS JOINandINNER JOINproduce a simple Cartesian product—the same result as listing two tables in FROM—but constrained by the join condition (if any).CROSS JOINis equivalent toINNER JOIN ON (TRUE), meaning the condition removes no rows. These join types are purely notational convenience, as nothing they do cannot also be done with plain FROM and WHERE.
join_condition
join_conditionis an expression that evaluates to a Boolean value (similar to a WHERE clause), specifying which rows are considered matching in a join.USING ( a, b, ... )
The clause
USING (a, b...)is shorthand forON left_table.a = right_table.a AND left_table.b = right_table.b .... Additionally, USING ensures only one column from each equal pair appears in the join output.NATURAL
Shorthand for USING that lists all columns with identical names in both tables.
-
-
-
WHERE clause
-
Command overview
The optional WHERE clause has the following form:
WHERE condition -
Parameter description
Parameter
Description
condition
condition is any expression that evaluates to a Boolean result. Any row that does not satisfy this condition is removed from the output. A row satisfies the condition if substituting its actual values for variable references causes the expression to return true.
-
-
GROUP BY clause
-
Command overview
The optional
GROUP BYclause condenses all selected rows that share the same values for the grouping expressions into a single summary row. TheGROUP BYclause has the following form:GROUP BY grouping_element [, ...] -
Parameter description
The expression in a
grouping_elementcan be an input column name, the name or ordinal number of an output column from the SELECT list, or an expression that is formed from input column values. If a name is ambiguous, theGROUP BYclause interprets it as an input column name rather than an output column name.If a grouping element contains
GROUPING SETS, ROLLUP, or CUBE, theGROUP BYclause defines multiple independent grouping sets. This is equivalent to a UNION ALL of subqueries, where each subquery has its ownGROUP BYclause that corresponds to one of the grouping sets.If you use aggregate functions, they are computed across all rows in each group, which produces a single value for each group. If you use aggregate functions but do not include a
GROUP BYclause, the query treats all selected rows as a single group. You can filter the rows that are passed to an aggregate function by appending a FILTER clause to the aggregate function call. If a FILTER clause is present, only the rows that match the filter condition are included in the input for that aggregate function.If a
GROUP BYclause or any aggregate function is present, expressions in the SELECT list cannot reference ungrouped columns, except within an aggregate function or if the ungrouped column is functionally dependent on the grouped columns. This is because there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns include the primary key, or a subset of the primary key, of the table that contains the ungrouped column.All aggregate functions are computed before any scalar expressions in the HAVING clause or the SELECT list. This means, for example, that a CASE expression cannot be used to skip the computation of an aggregate function.
-
-
CUBE clause
-
CUBE
The CUBE subclause automatically creates subtotals for all combinations of the grouping columns that are specified in the GROUP BY clause. The result set includes all possible combinations of values in the dimension columns, along with aggregated values from the base rows that match those dimension combinations. CUBE returns one summary row for each group. You can use CUBE to generate cross-tabulation values. For example, if you specify three expressions
(n = 3)in the CUBE clause, the operation results in 2n = 23 = 8 groups. Rows that are grouped by all n expression values are regular rows. All other rows are super-aggregate rows. The syntax is as follows:CUBE ( { expression | ( expression [, …] ) } [, …] ) -
ROLLUP
The ROLLUP subclause is an extension of the GROUP BY clause. When you generate statistics and reports, you can use ROLLUP to return a subtotal for each group and a grand total for all groups. The syntax is as follows:
ROLLUP ( { expression | ( expression [, …] ) } [, …] ) -
GROUPING SETS
The GROUPING SETS subclause is an extension of the GROUP BY clause that lets you specify multiple GROUP BY options in a single query. This improves efficiency by allowing you to select only the grouping sets that you need. When you specify only the required data groups, the database does not need to compute the full set of aggregations that is generated by CUBE or ROLLUP. The syntax is as follows:
GROUPING SETS ( grouping_element [, …] )
-
-
DISTINCT clause
If you specify SELECT DISTINCT, all duplicate rows are removed from the result set. Only one row is kept from each group of duplicate rows.
NoteSELECT DISTINCT on columns of the ARRAY data type is not supported.
SELECT DISTINCT accountid FROM table; -
COUNT DISTINCT clause
-
Command overview
COUNT DISTINCTcounts the number of distinct values in a column. A value that appears multiple times is counted only once. Similar to the COUNT function, NULL values in the column are not included in the count. -
Syntax description
-
Syntax for exact calculation:
SELECT c1, COUNT(DISTINCT c2) FROM table GROUP BY c1 -
An exact
COUNT DISTINCTcalculation can consume significant resources. For this reason, Hologres also supports approximateCOUNT DISTINCTcalculations. The syntax is as follows:SELECT c1, approx_count_distinct(c2) FROM table GROUP BY c1
-
-
-
UNION clause
-
Command overview
The syntax of the UNION clause is as follows:
select_statement UNION [ ALL | DISTINCT ] select_statement -
Parameter description
Parameter
Description
select_statement
select_statementis any SELECT statement withoutORDER BY, LIMIT, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, orFOR KEY SHAREclauses (if the subexpression is enclosed in parentheses,ORDER BYand LIMIT can be attached to it. Without parentheses, these clauses apply to the UNION result, not the right-hand expression).UNION
The UNION operator computes the union of rows returned by the involved SELECT statements. A row appears in the union if it appears in at least one of the two result sets. The SELECT statements used as UNION operands must produce the same number of columns, and corresponding columns must have compatible data types.
The UNION result does not contain duplicate rows unless the ALL option is specified. ALL prevents duplicate removal (so UNION ALL is usually significantly faster than UNION; use ALL whenever possible). You can explicitly specify duplicate removal by writing DISTINCT.
Multiple UNION operators in the same SELECT statement are evaluated left to right unless parentheses specify a different order.
-
-
INTERSECT clause
-
Command overview
The INTERSECT clause has the following form:
select_statement INTERSECT [ ALL | DISTINCT ] select_statement -
Parameter description
Parameter
Description
select_statement
select_statementis any SELECT statement without anORDER BY LIMITclause.INTERSECT
The INTERSECT operator computes the intersection of rows returned by the involved SELECT statements. A row appears in the intersection only if it appears in both result sets.
The INTERSECT result does not contain duplicate rows unless the ALL option is specified. With ALL, a row that appears m times in the left table and n times in the right table appears min(m, n) times in the result. You can explicitly specify duplicate removal by writing DISTINCT.
Multiple INTERSECT operators in the same SELECT statement are evaluated left to right unless parentheses specify a different order. INTERSECT has higher precedence than UNION. Thus,
A UNION B INTERSECT Cis read asA UNION (B INTERSECT C).
-
-
EXCEPT clause
-
Command overview
The EXCEPT clause has the following form:
select_statement EXCEPT [ ALL | DISTINCT ] select_statement -
Parameter description
Parameter
Description
select_statement
select_statementis any SELECT statement without anORDER BY LIMITclause.EXCEPT
The EXCEPT operator computes the set of rows that appear in the result of the left SELECT statement but not in the result of the right SELECT statement.
The EXCEPT result does not contain duplicate rows unless the ALL option is specified. With ALL, a row that appears m times in the left table and n times in the right table appears max(m - n, 0) times in the result. You can explicitly specify duplicate removal by writing DISTINCT.
Multiple EXCEPT operators in the same SELECT statement are evaluated left to right unless parentheses specify a different order. EXCEPT has the same precedence as UNION.
Currently,
FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, andFOR KEY SHAREcannot be used with EXCEPT results or any EXCEPT input.
-
-
ORDER BY clause
-
Command overview
The optional
ORDER BYclause has the following form:ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] -
Parameter description
The
ORDER BYclause sorts the result rows based on the specified expressions. If two rows are equal according to the leftmost expression, they are compared based on the next expression, and so on. If the rows are equal for all specified expressions, they are returned in an implementation-dependent order.Each expression can be the name or ordinal number of an output column from the SELECT list, or an expression that is formed from input column values.
The ordinal number refers to the position of the output column, from left to right. This feature lets you define an order for columns that do not have unique names. This is typically not necessary because you can use the AS clause to assign a unique name to an output column.
You can also use arbitrary expressions in the ORDER BY clause, including columns that are not in the SELECT output list. For example, the following statement is valid:
SELECT name FROM distributors ORDER BY code;A limitation of this feature is that when you apply an ORDER BY clause to the result of a UNION, INTERSECT, or EXCEPT clause, the
ORDER BYclause can specify only output column names or ordinal numbers, not expressions.If an ORDER BY expression is a simple name that matches both an output column name and an input column name, ORDER BY interprets it as the output column name. This behavior differs from the GROUP BY clause, which interprets an ambiguous name as an input column name. This inconsistency is for compatibility with the SQL standard.
You can add the ASC (ascending) or DESC (descending) keyword after any expression in the ORDER BY clause. If you do not specify a keyword, ASC is used by default. Alternatively, you can specify a sort operator name in a USING clause. A sort operator must be a less-than or greater-than member of a B-tree operator family. ASC is usually equivalent to USING <, and DESC is usually equivalent to USING >. However, the creator of a user-defined data type can define a specific default sort order, which might correspond to operators with other names.
If you specify
NULLS LAST, null values are sorted after non-null values. If you specifyNULLS FIRST, null values are sorted before non-null values. If you do not specify either, the default behavior for ASC isNULLS LAST, and the default behavior for DESC isNULLS FIRST. By default, nulls are treated as being greater than non-nulls. When you use a USING clause, the default nulls order depends on whether the operator is a less-than or greater-than operator.Note that ordering options apply only to the expression they follow. For example,
ORDER BY x, y DESCdiffers fromORDER BY x DESC, y DESC.
-
-
LIMIT clause
-
Command overview
The LIMIT clause consists of two independent subclauses:
LIMIT { count | ALL } OFFSET start -
Parameter description
The count parameter specifies the maximum number of rows to return. The start parameter specifies the number of rows to skip before the system starts to return rows. If you specify both parameters, the system skips the specified number of start rows before it begins to count the specified number of count rows to return.
If the count expression evaluates to NULL, it is treated as
LIMIT ALL, which means there is no limit. If the start expression evaluates to NULL, it is treated as OFFSET 0.When you use a LIMIT clause, you should also include an
ORDER BYclause to constrain the rows to a unique order. Otherwise, you will retrieve an unpredictable subset of the query's rows. For example, if you request rows 10 through 20 without specifying anORDER BYclause, there is no guarantee which rows will be returned.The query planner takes the LIMIT clause into account when it generates a query plan. Therefore, different LIMIT and OFFSET values will likely result in different query plans and different row orders. If you select different subsets of a query result using different LIMIT and OFFSET values, you might receive inconsistent results unless you enforce a predictable order with an ORDER BY clause. This is not a bug. It is an intended consequence of the fact that SQL does not guarantee a specific order for query results unless an
ORDER BYclause is used to constrain the order.If you do not use an
ORDER BYclause to enforce a deterministic subset, executing the same LIMIT query multiple times might return different subsets of the table rows. This is not a bug. Determinism is not guaranteed in such cases.
-
Usage examples
-
Join two tables
SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM distributors d, films f WHERE f.did = d.did; -
WITH clause
WITH distributor_name(name) AS (SELECT name from distributors) SELECT name FROM distributor_name ORDER BY name; -
GROUP BY
SELECT kind, sum(length) AS total FROM films GROUP BY kind; -
Filtering with HAVING
SELECT kind, sum(length) AS total FROM films GROUP BY kind HAVING sum(length) < interval '5 hours'; -
GROUP BY CUBE
SELECT l_returnflag ,l_shipmode ,SUM(l_quantity) FROM public.lineitem GROUP BY cube((l_returnflag),(l_shipmode)) ORDER BY l_returnflag ,l_shipmode; -
GROUP BY ROLLUP
SELECT l_returnflag ,l_shipmode ,SUM(l_quantity) FROM public.lineitem GROUP BY ROLLUP ((l_returnflag),(l_shipmode)) ORDER BY l_returnflag ,l_shipmode; -
GROUP BY GROUPING SETS
SELECT l_returnflag ,l_shipmode ,SUM(l_quantity) FROM public.lineitem GROUP BY GROUPING SETS ((l_returnflag,l_shipmode),()) ORDER BY l_returnflag ,l_shipmode; -
ORDER BY
SELECT * FROM distributors ORDER BY name;