Common operators supported by AnalyticDB for MySQL and their properties - AnalyticDB

AnalyticDB for MySQL operators complete basic data processing logic. You can use combined operators or optimize the order and execution method of operators to improve data processing efficiency. This topic describes the common operators supported by AnalyticDB for MySQL and their properties.

Background information

AnalyticDB for MySQL operators complete basic data processing logic. A set of operators complete a set of processing rules for data based on an execution plan. In the distributed system of AnalyticDB for MySQL, most operators can complete computing jobs in parallel on multiple nodes to improve data processing efficiency.

You can use the SQL diagnostics feature of AnalyticDB for MySQL to view the operators that have properties and the operator-level diagnosis results. For more information, see Operator-level diagnosis results and the "Execution plan hierarchy chart at the operator layer" section of the Use execution plans to analyze queries topic.

Note

Only specific operators have properties.

Aggregation

AnalyticDB for MySQL is a distributed database service that allows parallel aggregate operations on multiple nodes. For more information about regular aggregation and grouping-based aggregation, see Grouping and aggregation query optimization. The Aggregation operator uses functions such as SUM(), COUNT(), and AVG() to perform regular aggregation or grouping-based aggregation on data.

The following table describes the properties of the Aggregation operator.

Property	Description
GroupByKeys	The fields that are used for grouping.
AggregationFunctions	The aggregate functions that are used, such as `SUM()`, `COUNT()`, and `AVG()`. Note If no SELECT fields are included in the GROUP BY clause and no other aggregate functions are used, the system invokes the `ARBITRARY()` function to randomly return a value from a group. For more information, see the "ARBITRARY" section of the Aggregate functions topic.
Step	The current aggregation stage. Valid values: PARTIAL: partial aggregation. FINAL: final aggregation. SINGLE: single-step aggregation, which specifies that final aggregation is directly performed.

DistinctLimit

This operator is equivalent to the DISTINCT LIMIT operations in SQL statements.

Filter

AnalyticDB for MySQL supports the following types of data filtering:

Data source filtering: Data indexes at the storage layer are used to filter data.
Note
No independent operators are used to represent the data source filtering process. Filter conditions in this process are reflected in the TableScan operator. For more information, see the "TableScan" section of this topic.
Non-data source filtering: Data at the storage layer is not indexed. You must use the Filter operator to filter data at the computing layer.

By default, AnalyticDB for MySQL creates indexes for all fields. However, the filter conditions are not pushed down in the following scenarios:

If the no_index_columns or filter_not_pushdown_columns hint is used in query statements, or the adb_config filter_not_pushdown_columns configuration is used in clusters, the filter condition pushdown feature is disabled.
Functions such as CAST are used in filter conditions.
Related columns in filter conditions do not have indexes. For example, the no_index keyword is used when you create a table, or the no_index statement is executed to delete indexes after a table is created.

In this case, AnalyticDB for MySQL filters data by using the Filter operator.

The following table describes the property of the Filter operator.

Property	Description
Filter	The filter condition of the Filter operator.

Join

This operator is equivalent to the JOIN operations in SQL statements. When you create a distributed table in AnalyticDB for MySQL, you must specify distribution fields by using the DISTRIBUTED BY clause. The type of data redistribution is determined by whether the join key is used as a distribution field. For more information about data redistribution, see the "RemoteExchange" section of this topic.

The following table describes the properties of the Join operator.

Property	Description
Criterias	The join conditions. Example: `tb_1.col_a=tb_2.col_a`.
Type	The join type. Four types of joins are available in AnalyticDB for MySQL: INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
Method	The join algorithm. Two types of join algorithms are available in AnalyticDB for MySQL: Hash: The hash join algorithm caches a small table to the memory and joins the small table with another table by using the hash table of the small table to find matches. Index: The index join algorithm uses indexes of join keys based on the full-column indexing feature of AnalyticDB for MySQL.
JoinFilter	The filter conditions in a join.

Limit

This operator is equivalent to the LIMIT operations in SQL statements.

MarkDistinct

This operator is equivalent to the COUNT(DISTINCT) operations in SQL statements.

Note

The MarkDistinct operator is used only if an SQL statement contains two or more COUNT(DISTINCT) operations.
If an SQL statement contains only a single COUNT(DISTINCT) operation, AnalyticDB for MySQL automatically uses the Aggregation operator. For more information, see the "Aggregation" section of this topic.

Project

This operator is equivalent to the projection operations on specific fields in SQL statements, such as the CASE WHEN THEN control flow statement and the CONCAT() function.

The following table describes the property of the Project operator.

Property	Description
ProjectExpression	The projection expression.

RemoteExchange

This operator indicates the method used to transfer data from upstream stages to downstream stages. The following methods are available:

Broadcast: The data of each compute node in an upstream stage is copied and transferred to all compute nodes in a downstream stage.
Repartition: The data of each compute node in an upstream stage is partitioned based on specific rules and then distributed to specified compute nodes of a downstream stage.
Gather: The data of each compute node in an upstream stage is concentrated on a specific compute node in a downstream stage.

For more information, see the "Execution plan hierarchy chart at the stage layer" section of the Use execution plans to analyze queries topic.

RemoteSource

This operator indicates that the input data of the current stage is transferred from remote nodes over networks.

The following table describes the property of the RemoteSource operator.

Property	Description
OutputColumns	The fields that are exported by the RemoteSource operator.

StageOutput

This operator transfers the data processed in the current stage to nodes of a downstream stage over networks.

Sort

This operator is equivalent to the ORDER BY operations in SQL statements. This operator sorts the fields in the ORDER BY clause.

The following table describes the properties of the Sort operator.

Property

Description

OrderBy

The field that is used to sort data.

Orderings

The order in which data is sorted. Valid values:

ASC_NULLS_FIRST: Data is sorted in ascending order, in which NULL values are returned before non-NULL values.
ASC_NULLS_LAST: Data is sorted in ascending order, in which NULL values are returned after non-NULL values.
DESC_NULLS_FIRST: Data is sorted in descending order, in which NULL values are returned before non-NULL values.
DESC_NULLS_LAST: Data is sorted in descending order, in which NULL values are returned after non-NULL values.

SortMerge

During distributed sorting, this operator merges and sorts data transferred from multiple nodes in an upstream stage.

TableScan

This operator reads data from data sources. The underlying data sources can use indexes to filter data.

The following table describes the properties of the TableScan operator.

Property	Description
TableName	The name of the table to which the scanned data belongs.
SelectFields	The fields that are scanned by a TableScan node.
DataBase	The name of the database to which the scanned data belongs.
FilterPushDown	Specifies whether filter conditions are pushed down to a storage node. Note This property is displayed only if the current stage contains filter conditions that are pushed down.
PushedDownFilter	The filter conditions that are pushed down to a storage node. Note This property is displayed only if the current stage contains filter conditions that are pushed down.

TableWriter

After data is queried by using SQL statements that perform extract-transform-load (ETL) jobs, such as INSERT INTO and REPLACE INTO, this operator writes data to tables.

TopN

This operator is equivalent to the ORDER BY LIMIT m,n operations in SQL statements.

The following table describes the properties of the TopN operator.

Property	Description
OrderBy	The field that is used to sort data.
Count	This operator is equivalent to the ORDER BY LIMIT operations in SQL statements.
Orderings	The order in which data is sorted. Valid values: ASC_NULLS_FIRST: Data is sorted in ascending order, in which NULL values are returned before non-NULL values. ASC_NULLS_LAST: Data is sorted in ascending order, in which NULL values are returned after non-NULL values. DESC_NULLS_FIRST: Data is sorted in descending order, in which NULL values are returned before non-NULL values. DESC_NULLS_LAST: Data is sorted in descending order, in which NULL values are returned after non-NULL values.
Step	The execution step of the TopN operator. Valid values: PARTIAL: During distributed sorting, data on each node is partially sorted. SORTMERGE: Final merging and sorting are performed on data that has been partially sorted. SINGLE: Final merging and sorting are directly performed on data.
Offset	This operator is equivalent to the ORDER BY OFFSET operations in SQL statements.

Union

This operator is equivalent to the UNION operations in SQL statements.

Window

This operator is equivalent to the window functions in SQL statements. For more information, see Window functions.