列組統計資訊

0.0.201

本文由簡體中文內容自動轉碼而成。阿里雲不保證此自動轉碼的準確性、完整性及時效性。本文内容請以簡體中文版本為準。

AnalyticDB PostgreSQL版7.0版支援了列組（Column Group）統計資訊功能，您可以指定收集多個目標列的聯合統計資訊，在代價估算時打破獨立分布的假設，提高代價估算的準確性，進而提升查詢效能。

注意事項

列組統計資訊功能目前僅支援Planner最佳化器（原Legacy最佳化器），如何設定最佳化器，請參見參數配置。

文法

CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
    [ ( statistics_kind [, ... ] ) ]
    ON column_name, column_name [, ...]
    FROM table_name

參數介紹

參數	說明

參數	說明
statistics_name	列組統計資訊的名稱。
statistics_kind	需要收集的統計資訊種類，目前支援以下三種： ndistinct：多列組合值中不同值（DISTINCT）的數量。 dependencies：多列之間的Functional Dependency，可以理解為多列之間的相關性。 mcv：多列組合值中出現最多的值及其頻率。不指定該參數的話，以上三種類型的統計資訊都會採集。
column_name	列組統計資訊包含的列的名稱，至少需要指定兩個列。
table_name	上述列所屬的表名。

樣本

建立表ndistinct，並插入資料。

CREATE TABLE ndistinct(a int, b int, c int, d int) DISTRIBUTED BY (d);
INSERT INTO ndistinct(a, b) SELECT i/100, i/200, i/100, i FROM generate_series(1, 10000000) i;

其中a、b列的值有強相關性，不符合獨立分布的假設。

建立統計估算行數和實際行數的函數。

CREATE FUNCTION check_estimated_rows(text) returns table (estimated int, actual int)
LANGUAGE plpgsql AS
$$
DECLARE
    ln text;
    tmp text[];
    first_row bool := true;
BEGIN
    FOR ln in
        execute format('explain analyze %s', $1)
    LOOP
        IF first_row then
            first_row := false;
            tmp := regexp_match(ln, 'rows=(\d*) .* rows=(\d*)');
            return query select tmp[1]::int, tmp[2]::int;
        END IF;
    END LOOP;
END;
$$;

查看估算行數和實際行數。

SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, b');

返回資訊如下：

 estimated | actual
-----------+--------
   1000000 | 100001
(1 row)

本次樣本中估算行數為1000000行，實際行數為100001行，差距較大。

查看執行計畫。

EXPLAIN ANALYZE SELECT count(*) FROM ndistinct GROUP BY a,b,c;

執行計畫返回資訊如下，用於和建立列組統計資訊後的執行計畫進行對比：

                                                                        QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=137814.33..154481.00 rows=1000000 width=20) (actual time=3986.143..4014.601 rows=100001 loops=1)
   ->  HashAggregate  (cost=137814.33..141147.67 rows=333333 width=20) (actual time=3977.037..4002.222 rows=33661 loops=1)
         Group Key: a, b, c
         Peak Memory Usage: 0 kB
         ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=0.00..104481.00 rows=3333333 width=12) (actual time=0.079..1492.983 rows=3366100 loops=1)
               Hash Key: a, b, c
               ->  Seq Scan on ndistinct  (cost=0.00..37814.33 rows=3333333 width=12) (actual time=0.050..1101.632 rows=3334839 loops=1)
 Planning Time: 0.161 ms
   (slice0)    Executor memory: 12336K bytes.
   (slice1)    Executor memory: 13884K bytes avg x 3 workers, 13899K bytes max (seg0).  Work_mem: 16401K bytes max.
   (slice2)    Executor memory: 37K bytes avg x 3 workers, 37K bytes max (seg0).
 Memory used:  128000kB
 Optimizer: Postgres query optimizer
 Execution Time: 4041.613 ms
(14 rows)

建立列組統計資訊，並執行ANALYZE收集統計資訊。

CREATE STATISTICS s_a_b(ndistinct) ON a,b FROM ndistinct;
ANALYZE ndistinct;

再次查看估算行數和實際行數。
```
SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, b');
```
返回資訊如下：
```
 estimated | actual
-----------+--------
     99431 | 100001
(1 row)
```
本次樣本中估算行數為99431行，實際行數為100001行，相比較建立列組統計資訊前，估算的行數更準確。

查看執行計畫。

EXPLAIN ANALYZE SELECT count(*) FROM ndistinct GROUP BY a,b,c;

執行計畫返回資訊如下，從執行計畫可以看出，因為估算更準確，所以執行計畫中產生了Partial HashAggregate，進而縮短了查詢時間：

                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=75124.91..76782.09 rows=99431 width=20) (actual time=2854.765..2879.734 rows=100001 loops=1)
   ->  Finalize HashAggregate  (cost=75124.91..75456.34 rows=33144 width=20) (actual time=2853.610..2868.194 rows=33661 loops=1)
         Group Key: a, b, c
         Peak Memory Usage: 0 kB
         ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=71147.67..74130.60 rows=99431 width=20) (actual time=2269.435..2759.413 rows=100983 loops=1)
               Hash Key: a, b, c
               ->  Partial HashAggregate  (cost=71147.67..72141.98 rows=99431 width=20) (actual time=2744.039..2794.808 rows=100001 loops=1)
                     Group Key: a, b, c
                     Peak Memory Usage: 0 kB
                     ->  Seq Scan on ndistinct  (cost=0.00..37814.33 rows=3333333 width=12) (actual time=0.028..454.030 rows=3334839 loops=1)
 Planning Time: 0.173 ms
   (slice0)    Executor memory: 4670K bytes.
   (slice1)    Executor memory: 3134K bytes avg x 3 workers, 3149K bytes max (seg0).  Work_mem: 5649K bytes max.
   (slice2)    Executor memory: 13848K bytes avg x 3 workers, 13848K bytes max (seg0).  Work_mem: 14353K bytes max.
 Memory used:  128000kB
 Optimizer: Postgres query optimizer
 Execution Time: 2893.470 ms
(17 rows)

反饋

本頁導讀（1, M）

注意事項

文法

參數介紹

樣本

聯絡我們

立即和Alibaba Cloud在線服務人員進行交談，獲取您想了解的產品信息以及最新折扣。

找到我們

我們支持的支付方式

關於

關於阿里雲

定價模型

產品

客戶

合作夥伴

初創公司

雲棲大會

阿里雲峰會

促銷活動

免費試用

Simple Application Server

探索

China Gateway

支援申請ICP牌照

開始使用

網誌

市場

培訓與認證

支援

聯絡銷售團隊

提交服務單

售後支援

安全檢舉

提交建議

定價計數機

資源

文檔中心

Alibaba Cloud MVP

安全與合規

新聞發布室

WHOIS

狀態

產品和解決方案

Elastic Compute Service

CDN

Anti-DDoS

Object Storage Service

電子商務

網頁託管

安全

熱門內容

日本站

ECS文檔

如何獲得域名

軟件基礎設施

學習路徑

新用戶

注意事項

文法

參數介紹

樣本

銷售支援

技術支援

聯絡我們 & 報告濫用行為

關於 Alibaba Cloud

環球網絡

快速入門

環球辦公室

2024年巴黎奧運會 New

羅蘭加洛斯球場 — 昔日榮光 New

協和廣場 —「打破」障礙 New

馬恩河畔韋爾航海體育場 — 可持續運動 New

國際廣播中心 — 吸引數十億人的影像、聲音和數據 New

客戶成功案例 New

阿里雲信任中心

合規計劃

雲端合規資源

合規常見問題

最新產品及功能 New

Cloud Forward

新聞發佈室

阿里雲電子期刊 New

Alibaba Cloud 分析師研究

公告

阿里雲出海業務 New

“橙”雲出海服務聯盟

Asia Accelerator Hot

資訊合規

China Gateway - MLPS 2.0 合規 New

China Gateway - 網絡

China Gateway - 全球加速應用程式 New

China Gateway - 安全

China Gateway - 數據安全 New

ICP 支援 Hot

China Gateway - 全域數據中台 New

China Gateway - 組織數據中端 New

China Gateway - 業務中端 New

China Gateway - 智慧客服解決方案 New

China Gateway - 網上教育

China Gateway - 網域註冊

工作在阿里雲

資深專業人士

學生和畢業生

免费试用

定價

優惠中心

減價

付出更少金錢，進行更多部署

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Lingma

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - 更輕鬆選擇雲端產品

阿裡雲ECS-滿足您所有的雲託管需求

1TB CDN—立即獲享免費 1 TB 輸出流量方案

安全性—面臨攻擊？ 獲享免費安全支援

Short Message Service - 免費測試現已登場

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

安全性—面臨攻擊？獲享免費安全支援