This topic describes the compatibility between a job and the state data used to start the job after you modify a group aggregation query for the job. In a group aggregation query, an aggregate function is used together with the GROUP BY clause.
Compatible modifications
Add, delete, or modify an aggregated field that does not contain the DISTINCT keyword. An aggregated field is generated by applying an aggregate function to a group of data records.
Partially compatible: Add an aggregated field. The value of the added field is incremented when the job starts.
Fully compatible: Delete an existing aggregated field. The state data of the deleted field is discarded.
Partially compatible: Add an aggregated field and delete an existing aggregated field at the same time. The value of the added field is incremented when the job starts. The state data of the deleted field is discarded.
Partially compatible: Modify an existing aggregated field. This modification is split into the following operations: Delete the original aggregated field and add a new aggregated field. The value of the added field is incremented when the job starts. The state data of the deleted field is discarded.
NoteIf you do not modify an aggregated field, the calculation results of the field are the same whether or not you use the state data.
-- Original SQL statement: SELECT a, SUM(b), MAX(c) FROM MyTable GROUP BY a; -- Partially compatible: Add count(c), which is an aggregated field. -- The calculation results of sum(b) and max(c) are not affected. The value of count(c) is incremented from 0 when the job starts. SELECT a, SUM(b), MAX(c), COUNT(c) FROM MyTable GROUP BY a; -- Fully compatible: Delete sum(b). -- The calculation results of max(c) are not affected. SELECT a, MAX(c) FROM MyTable GROUP BY a; -- Partially compatible: Change an aggregated field from max(c) to min(c). -- The calculation results of sum(b) are not affected. The max(c) field is considered deleted and its state data is discarded. -- The min(c) field is considered a new field and its value is incremented when the job starts. SELECT a, SUM(b), MIN(c) FROM MyTable GROUP BY a;
Fully compatible: Modify the sequence of aggregated fields that do not contain the DISTINCT keyword.
-- Original SQL statement: SELECT a, SUM(b), MAX(c) FROM MyTable GROUP BY a; -- Fully compatible: Modify the sequence of sum(b) and max(c). -- The calculation results of sum(b) and max(c) are not affected. SELECT a, MAX(c), SUM(b) FROM MyTable GROUP BY a;
Partially compatible: Modify the computation logic of an aggregated field that does not contain the DISTINCT keyword. In this case, the aggregated field is considered modified.
-- Original SQL statement: SELECT a, SUM(b), MAX(c) FROM MyTable GROUP BY a; -- Partially compatible: Change an aggregated field from max(c) to max(substring(c, 1, 5)). This modifies the computation logic by applying a substring function before aggregation. -- The calculation results of sum(b) are not affected. The max(c) field is considered deleted and its state data is discarded. -- The max(substring(c, 1, 5)) field is considered a new aggregated field and its value is incremented when the job starts. SELECT a, SUM(b), MAX(c) FROM ( SELECT a, b, SUBSTRING(c, 1, 5) AS c FROM MyTable ) GROUP BY a;
Fully compatible: Maintain the sequence of aggregate functions that contain the DISTINCT keyword when you modify the sequence of all aggregate functions.
-- Original SQL statement: INSERT INTO MySink SELECT a, MAX(b), SUM(DISTINCT b), COUNT(DISTINCT b) FROM MyTable GROUP BY a; -- Fully compatible: SUM(DISTINCT b) still precedes COUNT(DISTINCT b) after the modification. INSERT INTO MySink SELECT a, SUM(DISTINCT b), COUNT(DISTINCT b), MAX(b) FROM MyTable GROUP BY a;
Fully compatible: Do not use an aggregated field before and after the modification.
Fully compatible: Delete an aggregate function that supports retraction after a retraction operation.
-- Original SQL statement: SELECT c/2, AVG(avg_a) AS avg_avg_a, MAX(max_b) max_max_b FROM (SELECT c, MAX(b) AS max_b, AVG(a) AS avg_a FROM MyTable GROUP BY c) GROUP BY c/2; -- Fully compatible: Delete an aggregate function that supports retraction. SELECT c/2, AVG(avg_a) AS avg_avg_a FROM (SELECT c, MAX(b) AS max_b, AVG(a) AS avg_a FROM MyTable GROUP BY c) GROUP BY c/2;
Incompatible modifications
Add, delete, or modify the fields in the GROUP BY clause or change the computation logic of the fields.
-- Original SQL statement: SELECT a, SUM(b), MAX(c) FROM MyTable GROUP BY a; -- Incompatible: Add d in the GROUP BY clause. SELECT a, SUM(b), MAX(c) FROM MyTable GROUP BY a, d; -- Incompatible: Delete a from the GROUP BY clause by deleting the GROUP BY clause. SELECT SUM(b), MAX(c) FROM MyTable; -- Incompatible: Change the field in the GROUP BY clause from a to d. SELECT d, SUM(b), MIN(c) FROM MyTable GROUP BY d; -- Incompatible: Change the field in the GROUP BY clause from a to a + 1. SELECT a, SUM(b), MAX(c) FROM ( SELECT a + 1 AS a, b, c FROM MyTable ) GROUP BY a;
Add, delete, or modify an aggregated field that contains the DISTINCT keyword or change the computation logic of the fields.
-- Original SQL statement: SELECT a, SUM(b), MAX(c), SUM(DISTINCT b), COUNT(DISTINCT c) FROM MyTable GROUP BY a; -- Incompatible: Add count(distinct b). SELECT a, SUM(b), MAX(c), SUM(DISTINCT b), COUNT(DISTINCT b), COUNT(DISTINCT c) FROM MyTable GROUP BY a; -- Incompatible: Delete sum(distinct b). SELECT a, SUM(b), MAX(c), COUNT(DISTINCT c) FROM MyTable GROUP BY a; -- Incompatible: Change sum(distinct b) to avg(distinct b). SELECT a, SUM(b), MAX(c), AVG(DISTINCT b), COUNT(DISTINCT c) FROM MyTable GROUP BY a; -- Incompatible: Change count(distinct c) to count(distinct avg(c)). SELECT a, SUM(b), MAX(c), SUM(DISTINCT b), COUNT(DISTINCT c) FROM ( SELECT a, b, AVG(c) AS c from MyTable GROUP BY a, b ) GROUP BY a;
Add an aggregated field in a multi-level aggregation. The calculation results of the added field are unpredictable because retraction occurs in a multi-level aggregation. As a result, this modification is incompatible.
-- Original SQL statement: SELECT a/2, AVG(b), MIN(c) FROM ( SELECT a, SUM(b) AS b, MAX(c) AS c FROM MyTable GROUP BY a ) GROUP BY a/2; -- Incompatible: Add count(c). SELECT a/2, AVG(b), MIN(c), COUNT(c) FROM ( SELECT a, SUM(b) AS b, MAX(c) AS c FROM MyTable GROUP BY a ) GROUP BY a/2;
Delete all aggregated fields. This modification is incompatible because the state data of the fields is discarded and no state data is reused.
-- Original SQL statement: SELECT a, SUM(b), MAX(c) FROM MyTable GROUP BY a; -- Incompatible: Delete sum(b) and max(c). SELECT a FROM MyTable GROUP BY a;
Modify the sequence of aggregate functions that contain the DISTINCT keyword.
-- Original SQL statement: INSERT INTO MySink SELECT a, MAX(b), SUM(DISTINCT b), COUNT(DISTINCT b) FROM MyTable GROUP BY a; -- Incompatible: Reverse the sequence of SUM(DISTINCT b) and COUNT(DISTINCT b). INSERT INTO MySink SELECT COUNT(DISTINCT b), a, MAX(b), SUM(DISTINCT b) FROM MyTable GROUP BY a;
Add aggregated fields to a job that does not involve an aggregated field.
-- Original SQL statement: INSERT INTO MySink SELECT a, b FROM MyTable GROUP BY a,b; -- Incompatible: Add an aggregated field. INSERT INTO MySink SELECT a, b, SUM(b) FROM MyTable GROUP BY a,b;
Retain only one aggregated field and modify the computation logic of the field.
-- Original SQL statement: INSERT INTO MySink SELECT a, SUM(b), MAX(b), MAX(c) FROM MyTable GROUP BY a; -- Incompatible: Retain only the MAX(c) field and change its computation logic. INSERT INTO MySink SELECT a, MAX(c) FROM (SELECT a, b, c + 1 AS c, ts FROM MyTable) GROUP BY a;
Modify all aggregated fields in a job.
-- Original SQL statement: INSERT INTO MySink SELECT a, b, MAX(c) FROM MyTable GROUP BY a,b; -- Incompatible: Modify the only aggregated field. INSERT INTO MySink SELECT a, b, MIN(c) FROM MyTable GROUP BY a,b;
Add or modify an aggregate function that supports retraction after a retraction operation.
-- Original SQL statement: SELECT c/2, AVG(avg_a) AS avg_avg_a, MAX(max_b) max_max_b FROM (SELECT c, MAX(b) AS max_b, AVG(a) AS avg_a FROM MyTable GROUP BY c) GROUP BY c/2; -- Incompatible: Add an aggregate function that supports retraction. SELECT c/2, AVG(avg_a) AS avg_avg_a, MIN(max_b) min_max_b, MAX(max_b) max_max_b FROM (SELECT c, MAX(b) AS max_b, AVG(a) AS avg_a FROM MyTable GROUP BY c) GROUP BY c/2; -- Incompatible: Modify an aggregate function that supports retraction. SELECT c/2, AVG(avg_a) AS avg_avg_a, MIN(max_b) max_max_b FROM (SELECT c, MAX(b) AS max_b, AVG(a) AS avg_a FROM MyTable GROUP BY c) GROUP BY c/2;
Modifications that result in unknown compatibility
If you use a Python user-defined aggregate function (UDAF) before or after the modification, the system cannot determine the compatibility.
-- Unknown compatibility: Use Python UDAFs, such as the weighted_avg function shown in the following code, before or after the modification.
SELECT a, MAX(b), SUM(DISTINCT b), COUNT(DISTINCT b), weighted_avg(a, b)
FROM MyTable GROUP BY a, b;