全部产品
Search
文档中心

实时计算Flink版:Deduplication

更新时间:Jul 16, 2024

本文为您介绍Deduplication修改的可兼容性和不可兼容性详情。

可兼容的变更

  • 当基于proctime做order by且按升序排序时(order by proctime asc),可以修改除partition by key之前的字段,该修改属于完全兼容变更。

    -- 原始SQL。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 新增输入字段:d,该修改属于完全兼容变更。
    SELECT a, b, c, d FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 删除输入字段:b,该修改属于完全兼容变更。
    SELECT a, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 修改输入字段:c -> substring(c, 1, 5),该修改属于完全兼容变更。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime ASC) AS rk
       FROM (SELECT a, b, SUBSTRING(c,1,5) as c, proctime FROM MyTable))
    WHERE rk = 1;
  • 修改partition key的顺序,该修改属于完全兼容变更。

    -- 原始SQL。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a, b, c ORDER BY proctime ASC) as rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 修改partition key顺序,该修改属于完全兼容
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY c, a, b ORDER BY proctime ASC) as rk
       FROM MyTable)
    WHERE rk = 1;
  • 当基于rowtime进行order by或基于proctime进行order by且按降序排序(order by proctime desc) 时,若schema不变,该修改属于完全兼容变更。

    -- 原始SQL。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime DESC) AS rk
       FROM MyTable)
    WHERE rk = 1 AND c > 10;
    
    -- 删除一个字段,但不影响Schema,属于完全兼容变更。
    SELECT a, b FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime DESC) AS rk
       FROM MyTable)
    WHERE rk = 1 AND c > 10;

不兼容的修改

  • 新增、删除、修改partition by key或者partition by key涉及字段的计算逻辑发生变化,该修改属于不兼容变更。

    -- 原始SQL。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 新增partition by字段:d,该修改属于不兼容变更。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a,d ORDER BY proctime ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    
    -- 删除partition by字段:a,该修改属于不兼容变更。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (ORDER BY proctime ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 修改partition by字段:a -> a + 1,该修改属于不兼容变更。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (ORDER BY proctime ASC) AS rk
       FROM (SELECT a + 1 AS a, b, c, proctime FROM MyTable))
    WHERE rk = 1;
  • 修改order by相关属性(排序字段和方向),该修改属于不兼容变更。

    -- 原始SQL。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 修改order key:proctime -> ts,当前修改属于不兼容变更。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY ts ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 修改order:asc -> desc,当前修改属于不兼容变更。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime DESC) AS rk
       FROM MyTable)
    WHERE rk = 1;
  • 当基于rowtime做order by或基于proctime进行order by且按降序排序(order by proctime desc) 时,修改了输入字段,或者schema发生变化。则修改属于不兼容变更。

    -- 原始SQL。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime DESC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 新增输入字段:d,当前修改属于不兼容变更。
    SELECT a, b, c, d FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime DESC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 删除输入字段:c,当前修改属于不兼容变更。
    SELECT a, b FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime DESC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 输入字段:c -> substring(c, 1, 5),当前修改属于不兼容变更。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime DESC) AS rk
       FROM (select a, b, substring(c, 1, 5) as c, ts from MyTable))
    WHERE rk = 1;