全部產品
Search
文件中心

AnalyticDB:In-Database AI/ML概述

更新時間:Jun 19, 2025

AnalyticDB PostgreSQL 7.0版支援In-Database AI/ML功能,可在資料庫內直接進行資料處理與模型計算,顯著降低資料流轉成本。該功能基於相容PostgresML開源社區介面的pgml外掛程式實現,並在效能、功能和易用性方面進行了深度最佳化,支援GPU/CPU加速下的模型訓練、Fine-Tune、部署與推理。內建整合XGBoost、LightGBM、SciKit-Learn等主流機器學習演算法,助力企業高效構建智能化分析應用。

前提條件

  • 核心版本為V7.1.1.0及以上的AnalyticDB PostgreSQL 7.0版執行個體。

    說明

    您可以在控制台執行個體的基本資料頁查看核心小版本。如不滿足上述版本要求,需要您升級核心小版本

  • 執行個體資源類型為儲存彈性模式。

  • 已經安裝pgml外掛程式。

    說明
    • pgml暫不支援白屏化安裝,如有需要請提交工單聯絡工作人員協助安裝。如有卸載外掛程式需求,也請提交工單聯絡工作人員協助卸載。

    • 暫不支援在AnalyticDB for PostgreSQL7.0經濟版安裝和使用pgml外掛程式。

中繼資料簡介

AnalyticDB PostgreSQL 7.0版中In-Database AI/ML架構是基於pgml外掛程式實現的。當在合格版本中安裝完pgml外掛程式後,系統會自動建立名為pgml的Schema。在該Schema下有以下中繼資料表。

中繼資料表名稱

描述

projects

訓練任務中對應的專案資訊。

models

訓練後的模型資訊。

files

模型檔案的儲存資訊。

snapshots

訓練時資料集的快照。

logs

訓練過程中輸出的日誌資訊。

deployments

訓練後模型的部署資訊。

當發起訓練時,訓練資訊會被自動寫入以上中繼資料表。

說明

中繼資料表中pgml的自訂類型(如task、runtime和sampling等)的介紹請參見機器學習

projects

projects表記錄訓練任務的專案ID、專案名稱、任務類型、建立時間和更新時間。表結構和索引等資訊如下。

                                         Table "pgml.projects"
   Column   |            Type             | Collation | Nullable |               Default                
------------+-----------------------------+-----------+----------+--------------------------------------
 id         | bigint                      |           | not null | nextval('projects_id_seq'::regclass)
 name       | text                        |           | not null | 
 task       | task                        |           | not null | 
 created_at | timestamp without time zone |           | not null | clock_timestamp()
 updated_at | timestamp without time zone |           | not null | clock_timestamp()
Indexes:
    "projects_pkey" PRIMARY KEY, btree (id)
    "projects_name_idx" btree (name)
Triggers:
    projects_auto_updated_at BEFORE UPDATE ON projects FOR EACH ROW EXECUTE FUNCTION set_updated_at()
    trigger_before_insert_pgml_projects BEFORE INSERT ON projects FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_projects()
Distributed Replicated

models

models表記錄模型訓練時指定的參數和關聯的專案ID和快照ID等資訊。表結構和索引等資訊如下。

                                           Table "pgml.models"
    Column     |            Type             | Collation | Nullable |              Default               
---------------+-----------------------------+-----------+----------+------------------------------------
 id            | bigint                      |           | not null | nextval('models_id_seq'::regclass)
 project_id    | bigint                      |           | not null | 
 snapshot_id   | bigint                      |           |          | 
 num_features  | integer                     |           | not null | 
 algorithm     | text                        |           | not null | 
 runtime       | runtime                     |           |          | 'python'::runtime
 hyperparams   | jsonb                       |           | not null | 
 status        | text                        |           | not null | 
 metrics       | jsonb                       |           |          | 
 search        | text                        |           |          | 
 search_params | jsonb                       |           | not null | 
 search_args   | jsonb                       |           | not null | 
 created_at    | timestamp without time zone |           | not null | clock_timestamp()
 updated_at    | timestamp without time zone |           | not null | clock_timestamp()
Indexes:
    "models_pkey" PRIMARY KEY, btree (id)
    "models_project_id_idx" btree (project_id)
    "models_snapshot_id_idx" btree (snapshot_id)
Triggers:
    models_auto_updated_at BEFORE UPDATE ON models FOR EACH ROW EXECUTE FUNCTION set_updated_at()
    trigger_before_insert_pgml_models BEFORE INSERT ON models FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_models_fk()
Distributed Replicated

files

在訓練結束後,模型目錄下的每個檔案以二進位形式被儲存到files表的data列裡,檔案二進位流會按照每100MB切片儲存。表結構和索引等資訊如下。

                                         Table "pgml.files"
   Column   |            Type             | Collation | Nullable |              Default              
------------+-----------------------------+-----------+----------+-----------------------------------
 id         | bigint                      |           | not null | nextval('files_id_seq'::regclass)
 model_id   | bigint                      |           | not null | 
 path       | text                        |           | not null | 
 part       | integer                     |           | not null | 
 created_at | timestamp without time zone |           | not null | clock_timestamp()
 updated_at | timestamp without time zone |           | not null | clock_timestamp()
 data       | bytea                       |           | not null | 
Indexes:
    "files_pkey" PRIMARY KEY, btree (id)
    "files_model_id_path_part_idx" btree (model_id, path, part)
Triggers:
    files_auto_updated_at BEFORE UPDATE ON files FOR EACH ROW EXECUTE FUNCTION set_updated_at()
    trigger_before_insert_pgml_files BEFORE INSERT ON files FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_files()
Distributed Replicated

snapshots

snapshots表記錄訓練時資料集的快照資訊:資料表名稱、測試集劃分資訊等。表結構和索引等資訊如下。

                                           Table "pgml.snapshots"
    Column     |            Type             | Collation | Nullable |                Default                
---------------+-----------------------------+-----------+----------+---------------------------------------
 id            | bigint                      |           | not null | nextval('snapshots_id_seq'::regclass)
 relation_name | text                        |           | not null | 
 y_column_name | text[]                      |           |          | 
 test_size     | real                        |           | not null | 
 test_sampling | sampling                    |           | not null | 
 status        | text                        |           | not null | 
 columns       | jsonb                       |           |          | 
 analysis      | jsonb                       |           |          | 
 created_at    | timestamp without time zone |           | not null | clock_timestamp()
 updated_at    | timestamp without time zone |           | not null | clock_timestamp()
 materialized  | boolean                     |           |          | false
Indexes:
    "snapshots_pkey" PRIMARY KEY, btree (id)
Triggers:
    snapshots_auto_updated_at BEFORE UPDATE ON snapshots FOR EACH ROW EXECUTE FUNCTION set_updated_at()
Distributed Replicated

logs

Logs表記錄輸出訓練過程中的資訊。對於一個訓練任務可能會存在多條訓練資訊,可以對created_at列升序查看。表結構和索引等資訊如下。

                                         Table "pgml.logs"
   Column   |            Type             | Collation | Nullable |             Default              
------------+-----------------------------+-----------+----------+----------------------------------
 id         | integer                     |           | not null | nextval('logs_id_seq'::regclass)
 model_id   | bigint                      |           |          | 
 project_id | bigint                      |           |          | 
 created_at | timestamp without time zone |           |          | CURRENT_TIMESTAMP
 logs       | jsonb                       |           |          | 
Indexes:
    "logs_pkey" PRIMARY KEY, btree (id)
Distributed Replicated

deployments

當模型需要部署時,系統會建立一條部署資訊,關聯專案ID、部署ID和模型ID,deployments表記錄部署的策略。表結構和索引等資訊如下。

                                         Table "pgml.deployments"
   Column   |            Type             | Collation | Nullable |                 Default                 
------------+-----------------------------+-----------+----------+-----------------------------------------
 id         | bigint                      |           | not null | nextval('deployments_id_seq'::regclass)
 project_id | bigint                      |           | not null | 
 model_id   | bigint                      |           | not null | 
 strategy   | strategy                    |           | not null | 
 created_at | timestamp without time zone |           | not null | clock_timestamp()
Indexes:
    "deployments_pkey" PRIMARY KEY, btree (id)
    "deployments_model_id_created_at_idx" btree (model_id)
    "deployments_project_id_created_at_idx" btree (project_id)
Triggers:
    deployments_auto_updated_at BEFORE UPDATE ON deployments FOR EACH ROW EXECUTE FUNCTION set_updated_at()
    trigger_before_insert_pgml_deployments BEFORE INSERT ON deployments FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_deployments_fk()
Distributed Replicated