本文將類比業務生產環境,在巨量資料量環境下,針對單表以及多表查詢效率低下的情況,如何通過列存索引(In-Memory Column Index,簡稱IMCI)來提升查詢效能。
什麼是列存索引
列存索引是將表的全部或部分列按照列式儲存格式在PolarDB的唯讀節點上儲存一份,形成行列混合儲存。同時,針對查詢最佳化工具進行了最佳化,並新增面向列式儲存的執行運算元。這使得在處理大規模資料集時,資料分析和複雜查詢的效能能夠顯著提高。更多資訊,請參見什麼是列存索引(IMCI)。
操作流程
準備工作
類比業務生產環境情境,以便作為測試參考。
叢集
產品版本:企業版。
系列:叢集版(獨享規格)。
核心版本:8.0.1.1.45.2。
開啟熱備叢集:開啟儲存熱備叢集。
計算節點:32核256 GB (polar.mysql.x8.4xlarge),一個主節點一個唯讀節點(熱備)。
儲存類型:PSL5。
參數模板:MySQL_InnoDB_8.0_標準版_預設參數模板。
資料
基於TPC-H基準測試,資料量100 GB。
-- 查詢資料庫內表的行數和大小。 +----------+----------+-----------+-----------------+ | Database | Table | Rows | Total Size (GB) | +----------+----------+-----------+-----------------+ | tpch | customer | 13179406 | 2.59 | | tpch | lineitem | 590446240 | 87.52 | | tpch | nation | 25 | 0.00 | | tpch | orders | 142929780 | 18.70 | | tpch | part | 19354445 | 3.11 | | tpch | partsupp | 67862725 | 20.45 | | tpch | region | 5 | 0.00 | | tpch | supplier | 986923 | 0.17 | +----------+----------+-----------+-----------------+說明資料庫內表的行數和大小受到叢集內資料庫的索引、儲存引擎、統計資訊以及系統資料表等多種因素的影響。因此,實際輸出結果與上述查詢結果之間可能存在差異。
本文的TPC-H的實現基於TPC-H的基準測試,並不能與發行的TPC-H基準測試結果相比較,本文中的測試並不符合TPC-H基準測試的所有要求。
配置列存索引
添加一個列存索引唯讀節點,本文添加的節點規格與主節點相同,即32核256 GB (polar.mysql.x8.4xlarge)。具體資訊,請參見添加一個列存索引唯讀節點。
單表查詢
類比業務中某條SQL執行效率低下的情境。在未建立列存索引的情況下執行下列單表查詢SQL,並記錄其執行時間。
單表掃描及過濾
SELECT * FROM lineitem WHERE L_COMMENT > 'aaaaaaaa' AND L_COMMENT < 'aaaaaaz';-- 執行結果 Empty set (8 min 47.29 sec)單列彙總(AGG)
SELECT SUM(L_DISCOUNT) from lineitem;-- 執行結果 +-----------------+ | SUM(L_DISCOUNT) | +-----------------+ | 30001636.44 | +-----------------+ 1 row in set (2 min 6.64 sec)分組彙總(GROUP BY)
SELECT AVG(L_DISCOUNT) FROM lineitem WHERE L_SHIPDATE <= date '1998-12-01' - interval '90' day GROUP BY L_RETURNFLAG, L_LINESTATUS;-- 執行結果 +-----------------+ | AVG(L_DISCOUNT) | +-----------------+ | 0.049998 | | 0.050001 | | 0.050002 | | 0.049985 | +-----------------+ 4 rows in set (6 min 28.96 sec)極限深翻頁效能 (ORDER BY+LIMIT)
SELECT L_ORDERKEY, SUM(L_QUANTITY) FROM lineitem GROUP BY L_ORDERKEY ORDER BY SUM(L_QUANTITY) DESC LIMIT 1000000, 100;-- 執行結果 +------------+-----------------+ | L_ORDERKEY | SUM(L_QUANTITY) | +------------+-----------------+ | 25226310 | 244.00 | | ... | ... | | 494738146 | 244.00 | +------------+-----------------+ 100 rows in set (12 min 24.22 sec)建立列存索引。具體資訊,請參見建立列存索引。
ALTER TABLE lineitem COMMENT 'COLUMNAR=1 lineitem表注釋';-- 執行結果 Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0查看列存索引構建的執行進度,等待列存索引構建完成。具體資訊,請參見查看列存索引構建的執行進度。
SELECT * FROM INFORMATION_SCHEMA.IMCI_ASYNC_DDL_STATS;-- 執行結果,列存索引構建中。 +-------------+------------+---------------------+---------------------+-------------+----------+------------------+--------------+-------------+-------------+-------------+------------+--------------+-----------+-------------------+-----------------+ | SCHEMA_NAME | TABLE_NAME | CREATED_AT | STARTED_AT | FINISHED_AT | STATUS | APPROXIMATE_ROWS | SCANNED_ROWS | SCAN_SECOND | SORT_ROUNDS | SORT_SECOND | BUILD_ROWS | BUILD_SECOND | AVG_SPEED | SPEED_LAST_SECOND | ESTIMATE_SECOND | +-------------+------------+---------------------+---------------------+-------------+----------+------------------+--------------+-------------+-------------+-------------+------------+--------------+-----------+-------------------+-----------------+ | tpch | lineitem | 2024-10-21 13:44:02 | 2024-10-21 13:44:02 | | Building | 590446240 | 36718757(6%) | 19 | 0 | 0 | 0(0%) | 0 | 1848522 | 0 | 299 | +-------------+------------+---------------------+---------------------+-------------+----------+------------------+--------------+-------------+-------------+-------------+------------+--------------+-----------+-------------------+-----------------+ 1 row in set, 1 warning (0.00 sec)-- 執行結果,列存索引構建完成。 +-------------+------------+---------------------+---------------------+---------------------+--------------+------------------+-----------------+-------------+-------------+-------------+------------+--------------+-----------+-------------------+-----------------+ | SCHEMA_NAME | TABLE_NAME | CREATED_AT | STARTED_AT | FINISHED_AT | STATUS | APPROXIMATE_ROWS | SCANNED_ROWS | SCAN_SECOND | SORT_ROUNDS | SORT_SECOND | BUILD_ROWS | BUILD_SECOND | AVG_SPEED | SPEED_LAST_SECOND | ESTIMATE_SECOND | +-------------+------------+---------------------+---------------------+---------------------+--------------+------------------+-----------------+-------------+-------------+-------------+------------+--------------+-----------+-------------------+-----------------+ | tpch | lineitem | 2024-10-21 13:44:02 | 2024-10-21 13:44:02 | 2024-10-21 13:50:11 | Safe to read | 590446240 | 600037902(100%) | 369 | 0 | 0 | 0(0%) | 0 | 1625058 | 0 | 0 | +-------------+------------+---------------------+---------------------+---------------------+--------------+------------------+-----------------+-------------+-------------+-------------+------------+--------------+-----------+-------------------+-----------------+ 1 row in set, 1 warning (0.00 sec)在表
lineitem建立列存索引情況下執行單表查詢SQL,並記錄執行時間。單表掃描及過濾
SELECT * FROM lineitem WHERE L_COMMENT > 'aaaaaaaa' AND L_COMMENT < 'aaaaaaz';-- 執行結果 Empty set (1.47 sec)單列彙總(AGG)
SELECT SUM(L_DISCOUNT) from lineitem;-- 執行結果 +-----------------+ | SUM(L_DISCOUNT) | +-----------------+ | 30001636.44 | +-----------------+ 1 row in set (0.06 sec)分組彙總(GROUP BY)
SELECT AVG(L_DISCOUNT) FROM lineitem WHERE L_SHIPDATE <= date '1998-12-01' - interval '90' day GROUP BY L_RETURNFLAG, L_LINESTATUS;-- 執行結果 +-----------------+ | AVG(L_DISCOUNT) | +-----------------+ | 0.050001 | | 0.050002 | | 0.049985 | | 0.049998 | +-----------------+ 4 rows in set (2.54 sec)極限深翻頁效能 (ORDER BY+LIMIT)
SELECT L_ORDERKEY, SUM(L_QUANTITY) FROM lineitem GROUP BY L_ORDERKEY ORDER BY SUM(L_QUANTITY) DESC LIMIT 1000000, 100;-- 執行結果 +------------+-----------------+ | L_ORDERKEY | SUM(L_QUANTITY) | +------------+-----------------+ | 299074498 | 244.00 | | ... | ... | | 168679332 | 244.00 | +------------+-----------------+ 100 rows in set (12.80 sec)執行時間對比(單位:秒)。
查詢類型
PolarDB 列存索引(IMCI)
PolarDB 行存
單表掃描及過濾
1.47
527.29
單列彙總(AGG)
0.06
126.64
分組彙總(GROUP BY)
2.54
388.96
極限深翻頁效能 (ORDER BY+LIMIT)
12.80
744.22
可以觀察到,在添加列存索引後,單表查詢SQL的執行效率顯著提高。
說明以上資料僅作為評估SQL執行效率的參考基準,而非絕對標準。實際上,SQL的具體執行時間會受到多種因素的動態影響,這些因素包括但不限於叢集的配置參數、當前的串連數、並發查詢的數量,以及系統資源的即時負載狀況。
多表查詢及子查詢
類比業務中某條SQL執行效率低下的情境。當SQL中所需的表或列未被列存索引完全覆蓋時,執行以下SQL並記錄其執行時間。
說明在下列SQL中,僅表
lineitem建立了列存索引,而其他表則未建立列存索引。當SQL所需的表或列未被列存索引完全覆蓋時,列存索引將無法生效。
若您不清楚當前SQL所需的表或列是否被列存索引完全覆蓋,可以通過
dbms_imci.check_columnar_index('<query_string>');預存程序來檢查SQL語句中的表是否已建立列存索引。同時,PolarDB為您提供相關預存程序用於擷取建立列存索引的DDL語句和批量擷取建立列存索引的DDL語句,助您快速建立列存索引。具體資訊,請參見列存索引DDL協助工具輔助。
多表關聯 (JOIN) 效能
SELECT COUNT(l3.L_DISCOUNT) FROM ( ( ( ( ( nation n1 STRAIGHT_JOIN nation n2 on n1.N_NATIONKEY = n2.N_NATIONKEY ) STRAIGHT_JOIN supplier on n2.N_NATIONKEY = supplier.S_NATIONKEY and S_SUPPKEY < 2000 ) STRAIGHT_JOIN lineitem AS l1 on l1.L_SUPPKEY = supplier.S_SUPPKEY ) STRAIGHT_JOIN lineitem AS l2 on l1.L_ORDERKEY = l2.L_ORDERKEY and l1.L_LINENUMBER = l2.L_LINENUMBER ) STRAIGHT_JOIN lineitem AS l3 on l2.L_ORDERKEY = l3.L_ORDERKEY and l2.L_LINENUMBER = l3.L_LINENUMBER ) GROUP BY n1.N_NAME;因查詢逾時(查詢逾時時間為7200秒),任務終止。因此,執行時間按超過7200秒進行記錄。
關聯子查詢效能
SELECT O_ORDERPRIORITY, COUNT(*) as ORDER_COUNT FROM orders WHERE O_ORDERDATE >= '1995-01-01' AND O_ORDERDATE < date_add('1995-01-01', interval '3' month) AND EXISTS ( SELECT * FROM lineitem WHERE L_ORDERKEY = O_ORDERKEY AND L_COMMITDATE < L_RECEIPTDATE ) GROUP BY O_ORDERPRIORITY ORDER BY O_ORDERPRIORITY;-- 執行結果 +-----------------+-------------+ | O_ORDERPRIORITY | ORDER_COUNT | +-----------------+-------------+ | 1-URGENT | 1028353 | | 2-HIGH | 1030059 | | 3-MEDIUM | 1028615 | | 4-NOT SPECIFIED | 1028496 | | 5-LOW | 1029615 | +-----------------+-------------+ 5 rows in set (4 min 9.51 sec)多表關聯帶子查詢效能
SELECT C_NAME, C_CUSTKEY, O_ORDERKEY, O_ORDERDATE, O_TOTALPRICE, SUM(L_QUANTITY) FROM ( SELECT * FROM orders WHERE O_ORDERKEY IN ( SELECT L_ORDERKEY FROM lineitem GROUP BY L_ORDERKEY HAVING SUM(L_QUANTITY) > 300 ) ) AS tmp, customer, lineitem WHERE C_CUSTKEY = O_CUSTKEY AND O_ORDERKEY = L_ORDERKEY GROUP BY C_NAME, C_CUSTKEY, O_ORDERKEY, O_ORDERDATE, O_TOTALPRICE ORDER BY O_TOTALPRICE DESC, O_ORDERDATE;+--------------------+-----------+------------+-------------+--------------+-----------------+ | C_NAME | C_CUSTKEY | O_ORDERKEY | O_ORDERDATE | O_TOTALPRICE | SUM(L_QUANTITY) | +--------------------+-----------+------------+-------------+--------------+-----------------+ | Customer#011472112 | 11472112 | 458304292 | 1998-02-05 | 591036.15 | 322.00 | | ... | ... | ... | ... | ... | ... | | Customer#003777694 | 3777694 | 470363105 | 1997-04-06 | 349914.00 | 302.00 | | Customer#009446411 | 9446411 | 592379937 | 1995-12-29 | 343496.05 | 304.00 | +--------------------+-----------+------------+-------------+--------------+-----------------+ 6398 rows in set (12 min 46.15 sec)為
tpch資料庫大量新增列存索引。具體資訊,請參見大量新增列存索引。CREATE COLUMNAR INDEX FOR TABLES IN tpch;-- 執行結果 +------------+-------------------+ | Table_Name | Result | +------------+-------------------+ | customer | Ok | | lineitem | Skip by no change | | nation | Ok | | orders | Ok | | part | Ok | | partsupp | Ok | | region | Ok | | supplier | Ok | +------------+-------------------+ 8 rows in set (56.74 sec)查看列存索引構建的執行進度,等待列存索引構建完成。具體資訊,請參見查看列存索引構建的執行進度。
SELECT * FROM INFORMATION_SCHEMA.IMCI_ASYNC_DDL_STATS;-- 執行結果,列存索引構建完成。 +-------------+------------+---------------------+---------------------+---------------------+--------------+------------------+-----------------+-------------+-------------+-------------+------------+--------------+-----------+-------------------+-----------------+ | SCHEMA_NAME | TABLE_NAME | CREATED_AT | STARTED_AT | FINISHED_AT | STATUS | APPROXIMATE_ROWS | SCANNED_ROWS | SCAN_SECOND | SORT_ROUNDS | SORT_SECOND | BUILD_ROWS | BUILD_SECOND | AVG_SPEED | SPEED_LAST_SECOND | ESTIMATE_SECOND | +-------------+------------+---------------------+---------------------+---------------------+--------------+------------------+-----------------+-------------+-------------+-------------+------------+--------------+-----------+-------------------+-----------------+ | tpch | region | 2024-10-21 14:43:27 | 2024-10-21 14:43:27 | 2024-10-21 14:43:27 | Safe to read | 5 | 5(100%) | 0 | 0 | 0 | 0(0%) | 0 | 150 | 0 | 0 | | tpch | lineitem | 2024-10-21 14:36:13 | 2024-10-21 14:36:13 | 2024-10-21 14:42:23 | Safe to read | 590446240 | 600037902(100%) | 370 | 0 | 0 | 0(0%) | 0 | 1620776 | 0 | 0 | | tpch | supplier | 2024-10-21 14:44:16 | 2024-10-21 14:44:16 | 2024-10-21 14:44:17 | Safe to read | 986923 | 1000000(100%) | 1 | 0 | 0 | 0(0%) | 0 | 784971 | 0 | 0 | | tpch | part | 2024-10-21 14:43:27 | 2024-10-21 14:43:27 | 2024-10-21 14:43:38 | Safe to read | 19354445 | 20000000(100%) | 11 | 0 | 0 | 0(0%) | 0 | 1784854 | 0 | 0 | | tpch | customer | 2024-10-21 14:43:19 | 2024-10-21 14:43:19 | 2024-10-21 14:43:27 | Safe to read | 13179406 | 15000000(100%) | 7 | 0 | 0 | 0(0%) | 0 | 2051651 | 0 | 0 | | tpch | nation | 2024-10-21 14:43:27 | 2024-10-21 14:43:27 | 2024-10-21 14:43:27 | Safe to read | 25 | 25(100%) | 0 | 0 | 0 | 0(0%) | 0 | 739 | 0 | 0 | | tpch | partsupp | 2024-10-21 14:43:27 | 2024-10-21 14:43:27 | 2024-10-21 14:44:16 | Safe to read | 67862725 | 80000000(100%) | 49 | 0 | 0 | 0(0%) | 0 | 1620131 | 0 | 0 | | tpch | orders | 2024-10-21 14:43:27 | 2024-10-21 14:43:27 | 2024-10-21 14:44:27 | Safe to read | 142929780 | 150000000(100%) | 59 | 0 | 0 | 0(0%) | 0 | 2501701 | 0 | 0 | +-------------+------------+---------------------+---------------------+---------------------+--------------+------------------+-----------------+-------------+-------------+-------------+------------+--------------+-----------+-------------------+-----------------+ 8 rows in set, 1 warning (0.00 sec)在為
tpch資料庫大量新增列存索引後,執行多表查詢及子查詢SQL,並記錄執行時間。多表關聯 (JOIN) 效能
SELECT COUNT(l3.L_DISCOUNT) FROM ( ( ( ( ( nation n1 STRAIGHT_JOIN nation n2 on n1.N_NATIONKEY = n2.N_NATIONKEY ) STRAIGHT_JOIN supplier on n2.N_NATIONKEY = supplier.S_NATIONKEY and S_SUPPKEY < 2000 ) STRAIGHT_JOIN lineitem AS l1 on l1.L_SUPPKEY = supplier.S_SUPPKEY ) STRAIGHT_JOIN lineitem AS l2 on l1.L_ORDERKEY = l2.L_ORDERKEY and l1.L_LINENUMBER = l2.L_LINENUMBER ) STRAIGHT_JOIN lineitem AS l3 on l2.L_ORDERKEY = l3.L_ORDERKEY and l2.L_LINENUMBER = l3.L_LINENUMBER ) GROUP BY n1.N_NAME;+----------------------+ | COUNT(l3.L_DISCOUNT) | +----------------------+ | 56930 | | ... | | 49995 | +----------------------+ 25 rows in set (6.25 sec)關聯子查詢效能
SELECT O_ORDERPRIORITY, COUNT(*) as ORDER_COUNT FROM orders WHERE O_ORDERDATE >= '1995-01-01' AND O_ORDERDATE < date_add('1995-01-01', interval '3' month) AND EXISTS ( SELECT * FROM lineitem WHERE L_ORDERKEY = O_ORDERKEY AND L_COMMITDATE < L_RECEIPTDATE ) GROUP BY O_ORDERPRIORITY ORDER BY O_ORDERPRIORITY;-- 執行結果 +-----------------+-------------+ | O_ORDERPRIORITY | ORDER_COUNT | +-----------------+-------------+ | 1-URGENT | 1028353 | | 2-HIGH | 1030059 | | 3-MEDIUM | 1028615 | | 4-NOT SPECIFIED | 1028496 | | 5-LOW | 1029615 | +-----------------+-------------+ 5 rows in set (2.49 sec)多表關聯帶子查詢效能
SELECT C_NAME, C_CUSTKEY, O_ORDERKEY, O_ORDERDATE, O_TOTALPRICE, SUM(L_QUANTITY) FROM ( SELECT * FROM orders WHERE O_ORDERKEY IN ( SELECT L_ORDERKEY FROM lineitem GROUP BY L_ORDERKEY HAVING SUM(L_QUANTITY) > 300 ) ) AS tmp, customer, lineitem WHERE C_CUSTKEY = O_CUSTKEY AND O_ORDERKEY = L_ORDERKEY GROUP BY C_NAME, C_CUSTKEY, O_ORDERKEY, O_ORDERDATE, O_TOTALPRICE ORDER BY O_TOTALPRICE DESC, O_ORDERDATE;-- 執行結果 +--------------------+-----------+------------+-------------+--------------+-----------------+ | C_NAME | C_CUSTKEY | O_ORDERKEY | O_ORDERDATE | O_TOTALPRICE | SUM(L_QUANTITY) | +--------------------+-----------+------------+-------------+--------------+-----------------+ | Customer#011472112 | 11472112 | 458304292 | 1998-02-05 | 591036.15 | 322.00 | | ... | ... | ... | ... | ... | ... | | Customer#003777694 | 3777694 | 470363105 | 1997-04-06 | 349914.00 | 302.00 | | Customer#009446411 | 9446411 | 592379937 | 1995-12-29 | 343496.05 | 304.00 | +--------------------+-----------+------------+-------------+--------------+-----------------+ 6398 rows in set (16.16 sec)執行時間對比(單位:秒)。
查詢類型
PolarDB 列存索引(IMCI)
PolarDB 行存
多表關聯 (JOIN) 效能
6.25
>7200
關聯子查詢效能
2.49
249.51
多表關聯帶子查詢效能
16.16
766.15
可以觀察到,在添加列存索引後,多表查詢及子查詢SQL的執行效率顯著提高。
說明以上資料僅作為評估SQL執行效率的參考基準,而非絕對標準。實際上,SQL的具體執行時間會受到多種因素的動態影響,這些因素包括但不限於叢集的配置參數、當前的串連數、並發查詢的數量,以及系統資源的即時負載狀況。
行列分流
在添加列存索引唯讀節點後,叢集地址的行列分流方案預設配置為行列自動分流。該配置適用於您在業務中,當OLAP類型與OLTP類型的請求通過同一應用程式訪問資料庫時,可以實現將兩類業務的讀請求按照掃描的行數進行自動分流,分別分流至列存節點或行存節點。如果在您的業務中,OLAP類型與OLTP類型的請求通過不同的應用程式訪問資料庫,則可以配置行列手動分流,為這些應用程式配置不同的資料庫連接地址,並將行存節點和列存節點分別配置到不同資料庫連接地址的服務節點,從而實現行存與列存的有效分流。具體資訊,請參見HTAP行列分流方案。
行列自動分流與手動分流的示意圖如下所示:
進階使用
更多資訊,請參見進階使用列存索引。
設定列存索引的排序鍵
詳細資料,請參見設定列索引的排序鍵。
列存索引資料群組織的基本單位為行組(Row Group),每個行組預設包含64K行資料。在每個行組中,不同的列會各自打包形成列資料區塊,這些列資料區塊按照行存未經處理資料的主鍵順序並行構建,整體上呈現無序狀態。您可以通過設定排序鍵來修改列資料區塊的排列順序,以提高查詢效能。
開啟列存索引排序功能,將參數
imci_enable_pack_order_key的值設定為ON,來開啟建立列存索引時資料排序功能。說明參數
imci_enable_pack_order_key的預設值為ON。如您此前未對該參數進行過修改,請忽略當前步驟。叢集參數在PolarDB控制台上都已加上MySQL設定檔的相容性首碼loose_。如果您需要在PolarDB控制台修改參數
imci_enable_pack_order_key,請選擇帶有首碼loose_的參數(即loose_imci_enable_pack_order_key)進行修改。具體資訊,請參見設定叢集參數和節點參數。
在未添加排序鍵時,執行下列SQL並記錄其執行時間。
SELECT L_SHIPMODE, SUM(CASE WHEN O_ORDERPRIORITY = '1-URGENT' OR O_ORDERPRIORITY = '2-HIGH' THEN 1 ELSE 0 END) AS high_line_count, SUM(CASE WHEN O_ORDERPRIORITY <> '1-URGENT' AND O_ORDERPRIORITY <> '2-HIGH' THEN 1 ELSE 0 END) AS low_line_count FROM orders, lineitem WHERE O_ORDERKEY = L_ORDERKEY AND L_SHIPMODE in ('MAIL', 'SHIP') AND L_COMMITDATE < L_RECEIPTDATE AND L_SHIPDATE < L_COMMITDATE AND L_RECEIPTDATE >= date '1994-01-01' AND L_RECEIPTDATE < date '1994-01-01' + interval '1' year GROUP BY L_SHIPMODE ORDER BY L_SHIPMODE;-- 執行結果 +------------+-----------------+----------------+ | L_SHIPMODE | high_line_count | low_line_count | +------------+-----------------+----------------+ | MAIL | 623115 | 934713 | | SHIP | 622979 | 934534 | +------------+-----------------+----------------+ 2 rows in set (4.35 sec)對錶
lineitem添加order_key屬性來構建有序列存索引資料。ALTER TABLE lineitem COMMENT='COLUMNAR=1 order_key=L_RECEIPTDATE,L_SHIPMODE lineitem表注釋';等待有序列存索引資料構建。具體資訊,請參見有序列存索引資料構建和查詢時間對比。
再次執行第二步的SQL並記錄其執行時間。
-- 執行結果 +------------+-----------------+----------------+ | L_SHIPMODE | high_line_count | low_line_count | +------------+-----------------+----------------+ | MAIL | 623115 | 934713 | | SHIP | 622979 | 934534 | +------------+-----------------+----------------+ 2 rows in set (0.88 sec)執行時間對比(單位:秒)。
有序資料集
無序資料集
0.88
4.35
說明以上資料僅作為評估SQL執行效率的參考基準,而非絕對標準。實際上,SQL的具體執行時間會受到多種因素的動態影響,這些因素包括但不限於叢集的配置參數、當前的串連數、並發查詢的數量,以及系統資源的即時負載狀況。
列存索引唯讀節點開啟Serverless
詳細資料,請參見列存索引唯讀節點開啟Serverless。
Serverless是指雲原生資料庫PolarDB所具備的動態彈性擴縮容能力。叢集中的各個節點能夠在秒級內實現彈性擴充,能夠從容應對業務負載的突增,且全程對業務運行無任何影響。在業務低負載期間,該機制能夠自動進行彈性縮容,從而有效降低業務成本。關於Serverless具體資訊,請參見Serverless。
若您的業務存在明顯的波峰波穀,或擔心當前叢集配置無法應對業務負載的突增,您可在叢集地區開啟Serverless功能。具體資訊,請參見開啟固定規格叢集的Serverless功能。
更多資訊
費用
列存索引功能可免費使用,僅收取列存索引唯讀節點的費用。列存索引唯讀節點按照普通的計算節點收費。具體資訊,請參見計算節點計費規則。同時,列存索引也會佔用一定的儲存空間。具體資訊,請參見儲存空間計費規則。
列存索引資料以列的形式進行組織,其壓縮比相對更高。與行存相比,通常可以實現3至10倍的壓縮比,所佔用的儲存空間約為行存的10%至30%。因此,儲存空間將額外增加10%至30%的資料量。
效能
查詢效能
列存索引對大多數的複雜查詢操作都有加速作用,查詢效能提升非常明顯,甚至可達到百倍。
與傳統OLAP資料庫ClickHouse相比:PolarDB MySQL版叢集開啟列存索引後,與ClickHouse效能相比各有優劣。其中在單表Scan/AGG、Join等情境中表現突出。未來的列存索引特性將在彙總加速、視窗函數等方面持續最佳化和突破。
說明詳細說明,請參見效能提升情況。
寫入效能
增加列存索引對寫入效能的影響基本在5%以內。使用Sysbench測試集測試
oltp_insert workload,增加列存索引後寫入效能下降約為3%。
專家面對面
若您對列存索引有任何問題,可通過DingTalk搜尋群號入群諮詢。您可以直接@群內專家,並附上您要諮詢的問題。DingTalk群號:27520023189。