AnalyticDB for PostgreSQL V7.0 provides the in-database AI/ML feature. You can process data by using the algorithms and models provided by the feature to reduce data transfer costs. The in-database AI/ML framework is compatible with the open source PostgresML community and performs significant optimization in functionality, performance, and usability to implement training, fine-tuning, deployment, and inference of algorithm models based on GPUs and CPUs. The in-database AI/ML framework is implemented based on the pgml extension. The pgml extension is one of the components in the open source PostgresML community and integrates classic machine learning algorithms, such as XGBoost, LightGBM, and scikit-learn.
Limits
Only AnalyticDB for PostgreSQL V7.0 instances of V7.1.1.0 or later support the in-database AI/ML feature.
The AnalyticDB for PostgreSQL instances must be in elastic storage mode.
The pgml extension must be installed.
NoteThe pgml extension does not support GUI-based installation. To install the pgml extension, submit a ticket. To uninstall the pgml extension, submit a ticket.
Introduction to metadata
The in-database AI/ML framework in AnalyticDB for PostgreSQL V7.0 is implemented based on the pgml extension. After you install the pgml extension for an AnalyticDB for PostgreSQL instance, the system automatically creates a schema named pgml. The following table describes the metadata tables contained in the pgml schema.
Metadata table | Description |
projects | Records information about projects in training tasks. |
models | Records information about model training. |
files | Records information about stored model files. |
snapshots | Records information about the snapshots of training datasets. |
logs | Records information about training logs. |
deployments | Records information about model deployment. |
When a model training task is initiated, training information is automatically written to the preceding metadata tables.
For information about the custom types of the pgml extension in the metadata tables, such as task, runtime, and sampling, see Use machine learning.
projects
The projects table records the project ID, project name, task type, creation time, and update time of training tasks. Table schema and indexes:
Table "pgml.projects"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+--------------------------------------
id | bigint | | not null | nextval('projects_id_seq'::regclass)
name | text | | not null |
task | task | | not null |
created_at | timestamp without time zone | | not null | clock_timestamp()
updated_at | timestamp without time zone | | not null | clock_timestamp()
Indexes:
"projects_pkey" PRIMARY KEY, btree (id)
"projects_name_idx" btree (name)
Triggers:
projects_auto_updated_at BEFORE UPDATE ON projects FOR EACH ROW EXECUTE FUNCTION set_updated_at()
trigger_before_insert_pgml_projects BEFORE INSERT ON projects FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_projects()
Distributed Replicated
models
The models table records the parameters specified for model training and other associated information, such as project ID and snapshot ID. Table schema and indexes:
Table "pgml.models"
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+------------------------------------
id | bigint | | not null | nextval('models_id_seq'::regclass)
project_id | bigint | | not null |
snapshot_id | bigint | | |
num_features | integer | | not null |
algorithm | text | | not null |
runtime | runtime | | | 'python'::runtime
hyperparams | jsonb | | not null |
status | text | | not null |
metrics | jsonb | | |
search | text | | |
search_params | jsonb | | not null |
search_args | jsonb | | not null |
created_at | timestamp without time zone | | not null | clock_timestamp()
updated_at | timestamp without time zone | | not null | clock_timestamp()
Indexes:
"models_pkey" PRIMARY KEY, btree (id)
"models_project_id_idx" btree (project_id)
"models_snapshot_id_idx" btree (snapshot_id)
Triggers:
models_auto_updated_at BEFORE UPDATE ON models FOR EACH ROW EXECUTE FUNCTION set_updated_at()
trigger_before_insert_pgml_models BEFORE INSERT ON models FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_models_fk()
Distributed Replicated
files
The files table records post-training model files in the data column in the binary format. Each file is divided into multiple splits of 100 MB each. Table schema and indexes:
Table "pgml.files"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+-----------------------------------
id | bigint | | not null | nextval('files_id_seq'::regclass)
model_id | bigint | | not null |
path | text | | not null |
part | integer | | not null |
created_at | timestamp without time zone | | not null | clock_timestamp()
updated_at | timestamp without time zone | | not null | clock_timestamp()
data | bytea | | not null |
Indexes:
"files_pkey" PRIMARY KEY, btree (id)
"files_model_id_path_part_idx" btree (model_id, path, part)
Triggers:
files_auto_updated_at BEFORE UPDATE ON files FOR EACH ROW EXECUTE FUNCTION set_updated_at()
trigger_before_insert_pgml_files BEFORE INSERT ON files FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_files()
Distributed Replicated
snapshots
The snapshots table records information about the snapshots of training datasets, such as the table name and test set division information. Table schema and indexes:
Table "pgml.snapshots"
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+---------------------------------------
id | bigint | | not null | nextval('snapshots_id_seq'::regclass)
relation_name | text | | not null |
y_column_name | text[] | | |
test_size | real | | not null |
test_sampling | sampling | | not null |
status | text | | not null |
columns | jsonb | | |
analysis | jsonb | | |
created_at | timestamp without time zone | | not null | clock_timestamp()
updated_at | timestamp without time zone | | not null | clock_timestamp()
materialized | boolean | | | false
Indexes:
"snapshots_pkey" PRIMARY KEY, btree (id)
Triggers:
snapshots_auto_updated_at BEFORE UPDATE ON snapshots FOR EACH ROW EXECUTE FUNCTION set_updated_at()
Distributed Replicated
logs
The logs table records information about training logs. A training task may contain multiple training logs. You can view the logs based on the values of the created_at column in ascending order. Table schema and indexes:
Table "pgml.logs"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+----------------------------------
id | integer | | not null | nextval('logs_id_seq'::regclass)
model_id | bigint | | |
project_id | bigint | | |
created_at | timestamp without time zone | | | CURRENT_TIMESTAMP
logs | jsonb | | |
Indexes:
"logs_pkey" PRIMARY KEY, btree (id)
Distributed Replicated
deployments
The deployments table records information about model deployment, such as the project ID, deployment ID, and model ID. Table schema and indexes:
Table "pgml.deployments"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+-----------------------------------------
id | bigint | | not null | nextval('deployments_id_seq'::regclass)
project_id | bigint | | not null |
model_id | bigint | | not null |
strategy | strategy | | not null |
created_at | timestamp without time zone | | not null | clock_timestamp()
Indexes:
"deployments_pkey" PRIMARY KEY, btree (id)
"deployments_model_id_created_at_idx" btree (model_id)
"deployments_project_id_created_at_idx" btree (project_id)
Triggers:
deployments_auto_updated_at BEFORE UPDATE ON deployments FOR EACH ROW EXECUTE FUNCTION set_updated_at()
trigger_before_insert_pgml_deployments BEFORE INSERT ON deployments FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_deployments_fk()
Distributed Replicated