全部產品
Search
文件中心

Realtime Compute for Apache Flink:Deduplication

更新時間:Jul 17, 2024

本文為您介紹Deduplication修改的可相容性和不可相容性詳情。

可相容的變更

  • 當基於proctime做order by且按升序排序時(order by proctime asc),可以修改除partition by key之前的欄位,該修改屬於完全相容變更。

    -- 原始SQL。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 新增輸入欄位:d,該修改屬於完全相容變更。
    SELECT a, b, c, d FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 刪除輸入欄位:b,該修改屬於完全相容變更。
    SELECT a, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 修改輸入欄位:c -> substring(c, 1, 5),該修改屬於完全相容變更。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime ASC) AS rk
       FROM (SELECT a, b, SUBSTRING(c,1,5) as c, proctime FROM MyTable))
    WHERE rk = 1;
  • 修改partition key的順序,該修改屬於完全相容變更。

    -- 原始SQL。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a, b, c ORDER BY proctime ASC) as rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 修改partition key順序,該修改屬於完全相容
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY c, a, b ORDER BY proctime ASC) as rk
       FROM MyTable)
    WHERE rk = 1;
  • 當基於rowtime進行order by或基於proctime進行order by且按降序排序(order by proctime desc) 時,若schema不變,該修改屬於完全相容變更。

    -- 原始SQL。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime DESC) AS rk
       FROM MyTable)
    WHERE rk = 1 AND c > 10;
    
    -- 刪除一個欄位,但不影響Schema,屬於完全相容變更。
    SELECT a, b FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime DESC) AS rk
       FROM MyTable)
    WHERE rk = 1 AND c > 10;

不相容的修改

  • 新增、刪除、修改partition by key或者partition by key涉及欄位的計算邏輯發生變化,該修改屬於不相容變更。

    -- 原始SQL。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 新增partition by欄位:d,該修改屬於不相容變更。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a,d ORDER BY proctime ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    
    -- 刪除partition by欄位:a,該修改屬於不相容變更。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (ORDER BY proctime ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 修改partition by欄位:a -> a + 1,該修改屬於不相容變更。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (ORDER BY proctime ASC) AS rk
       FROM (SELECT a + 1 AS a, b, c, proctime FROM MyTable))
    WHERE rk = 1;
  • 修改order by相關屬性(排序欄位和方向),該修改屬於不相容變更。

    -- 原始SQL。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 修改order key:proctime -> ts,當前修改屬於不相容變更。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY ts ASC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 修改order:asc -> desc,當前修改屬於不相容變更。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime DESC) AS rk
       FROM MyTable)
    WHERE rk = 1;
  • 當基於rowtime做order by或基於proctime進行order by且按降序排序(order by proctime desc) 時,修改了輸入欄位,或者schema發生變化。則修改屬於不相容變更。

    -- 原始SQL。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime DESC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 新增輸入欄位:d,當前修改屬於不相容變更。
    SELECT a, b, c, d FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime DESC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 刪除輸入欄位:c,當前修改屬於不相容變更。
    SELECT a, b FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime DESC) AS rk
       FROM MyTable)
    WHERE rk = 1;
    
    -- 輸入欄位:c -> substring(c, 1, 5),當前修改屬於不相容變更。
    SELECT a, b, c FROM (
      SELECT *,
       ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime DESC) AS rk
       FROM (select a, b, substring(c, 1, 5) as c, ts from MyTable))
    WHERE rk = 1;