AnalyticDB for MySQL operators complete basic data processing logic. You can use combined operators or optimize the order and execution method of operators to improve data processing efficiency. This topic describes the common operators supported by AnalyticDB for MySQL and their properties.
Background information
AnalyticDB for MySQL operators complete basic data processing logic. A set of operators complete a set of processing rules for data based on an execution plan. In the distributed system of AnalyticDB for MySQL, most operators can complete computing jobs in parallel on multiple nodes to improve data processing efficiency.
You can use the SQL diagnostics feature of AnalyticDB for MySQL to view the operators that have properties and the operator-level diagnosis results. For more information, see Operator-level diagnosis results and the "Execution plan hierarchy chart at the operator layer" section of the Use execution plans to analyze queries topic.
Only specific operators have properties.
Aggregation
AnalyticDB for MySQL is a distributed database service that allows parallel aggregate operations on multiple nodes. For more information about regular aggregation and grouping-based aggregation, see Grouping and aggregation query optimization. The Aggregation operator uses functions such as SUM()
, COUNT()
, and AVG()
to perform regular aggregation or grouping-based aggregation on data.
The following table describes the properties of the Aggregation operator.
Property | Description |
GroupByKeys | The fields that are used for grouping. |
AggregationFunctions | The aggregate functions that are used, such as Note If no SELECT fields are included in the GROUP BY clause and no other aggregate functions are used, the system invokes the |
Step | The current aggregation stage. Valid values:
|
DistinctLimit
This operator is equivalent to the DISTINCT LIMIT operations in SQL statements.
Filter
AnalyticDB for MySQL supports the following types of data filtering:
Data source filtering: Data indexes at the storage layer are used to filter data.
NoteNo independent operators are used to represent the data source filtering process. Filter conditions in this process are reflected in the TableScan operator. For more information, see the "TableScan" section of this topic.
Non-data source filtering: Data at the storage layer is not indexed. You must use the Filter operator to filter data at the computing layer.
By default, AnalyticDB for MySQL creates indexes for all fields. However, the filter conditions are not pushed down in the following scenarios:
If the
no_index_columns
orfilter_not_pushdown_columns
hint is used in query statements, or the adb_config filter_not_pushdown_columns configuration is used in clusters, the filter condition pushdown feature is disabled.Functions such as
CAST
are used in filter conditions.Related columns in filter conditions do not have indexes. For example, the
no_index
keyword is used when you create a table, or theno_index
statement is executed to delete indexes after a table is created.
In this case, AnalyticDB for MySQL filters data by using the Filter operator.
The following table describes the property of the Filter operator.
Property | Description |
Filter | The filter condition of the Filter operator. |
Join
This operator is equivalent to the JOIN operations in SQL statements. When you create a distributed table in AnalyticDB for MySQL, you must specify distribution fields by using the DISTRIBUTED BY
clause. The type of data redistribution is determined by whether the join key is used as a distribution field. For more information about data redistribution, see the "RemoteExchange" section of this topic.
The following table describes the properties of the Join operator.
Property | Description |
Criterias | The join conditions. Example: |
Type | The join type. Four types of joins are available in AnalyticDB for MySQL: INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN. |
Method | The join algorithm. Two types of join algorithms are available in AnalyticDB for MySQL:
|
JoinFilter | The filter conditions in a join. |
Limit
This operator is equivalent to the LIMIT operations in SQL statements.
MarkDistinct
This operator is equivalent to the COUNT(DISTINCT)
operations in SQL statements.
The MarkDistinct operator is used only if an SQL statement contains two or more
COUNT(DISTINCT)
operations.If an SQL statement contains only a single
COUNT(DISTINCT)
operation, AnalyticDB for MySQL automatically uses the Aggregation operator. For more information, see the "Aggregation" section of this topic.
Project
This operator is equivalent to the projection operations on specific fields in SQL statements, such as the CASE WHEN THEN
control flow statement and the CONCAT()
function.
The following table describes the property of the Project operator.
Property | Description |
ProjectExpression | The projection expression. |
RemoteExchange
This operator indicates the method used to transfer data from upstream stages to downstream stages. The following methods are available:
Broadcast: The data of each compute node in an upstream stage is copied and transferred to all compute nodes in a downstream stage.
Repartition: The data of each compute node in an upstream stage is partitioned based on specific rules and then distributed to specified compute nodes of a downstream stage.
Gather: The data of each compute node in an upstream stage is concentrated on a specific compute node in a downstream stage.
For more information, see the "Execution plan hierarchy chart at the stage layer" section of the Use execution plans to analyze queries topic.
RemoteSource
This operator indicates that the input data of the current stage is transferred from remote nodes over networks.
The following table describes the property of the RemoteSource operator.
Property | Description |
OutputColumns | The fields that are exported by the RemoteSource operator. |
StageOutput
This operator transfers the data processed in the current stage to nodes of a downstream stage over networks.
Sort
This operator is equivalent to the ORDER BY operations in SQL statements. This operator sorts the fields in the ORDER BY clause.
The following table describes the properties of the Sort operator.
Property | Description |
OrderBy | The field that is used to sort data. |
Orderings | The order in which data is sorted. Valid values:
|
SortMerge
During distributed sorting, this operator merges and sorts data transferred from multiple nodes in an upstream stage.
TableScan
This operator reads data from data sources. The underlying data sources can use indexes to filter data.
The following table describes the properties of the TableScan operator.
Property | Description |
TableName | The name of the table to which the scanned data belongs. |
SelectFields | The fields that are scanned by a TableScan node. |
DataBase | The name of the database to which the scanned data belongs. |
FilterPushDown | Specifies whether filter conditions are pushed down to a storage node. Note This property is displayed only if the current stage contains filter conditions that are pushed down. |
PushedDownFilter | The filter conditions that are pushed down to a storage node. Note This property is displayed only if the current stage contains filter conditions that are pushed down. |
TableWriter
After data is queried by using SQL statements that perform extract-transform-load (ETL) jobs, such as INSERT INTO and REPLACE INTO, this operator writes data to tables.
TopN
This operator is equivalent to the ORDER BY LIMIT m,n
operations in SQL statements.
The following table describes the properties of the TopN operator.
Property | Description |
OrderBy | The field that is used to sort data. |
Count | This operator is equivalent to the ORDER BY LIMIT operations in SQL statements. |
Orderings | The order in which data is sorted. Valid values:
|
Step | The execution step of the TopN operator. Valid values:
|
Offset | This operator is equivalent to the ORDER BY OFFSET operations in SQL statements. |
Union
This operator is equivalent to the UNION operations in SQL statements.
Window
This operator is equivalent to the window functions in SQL statements. For more information, see Window functions.