This topic describes how to create partitioned tables and replicated tables by using the CREATE TABLE
statement in AnalyticDB for MySQL and how to specify distribution keys, partition keys, indexes, partition lifecycle, and tiered storage policies for hot and cold data of tables.
Data distribution scheme of a table
The following figure shows the concepts that you need to familiarize yourself with before you create a table, such as shards, partitions, and clustered indexes.
Syntax
CREATE TABLE [IF NOT EXISTS] table_name
({column_name column_type [column_attributes] [ column_constraints ] [COMMENT 'column_comment']
| table_constraints}
[, ... ])
[table_attribute]
[partition_options]
[storage_policy]
[block_size]
[engine]
[rt_engine]
[table_properties]
[AS query_expr]
[COMMENT 'table_comment']
column_attributes:
[DEFAULT {constant | CURRENT_TIMESTAMP}]
[AUTO_INCREMENT]
column_constraints:
[{NOT NULL|NULL} ]
[PRIMARY KEY]
table_constraints:
[{INDEX|KEY} [index_name] (column_name,...)]
[{INDEX|KEY} [index_name] (column_name->'$[*]')]
[FULLTEXT [INDEX|KEY] [index_name] (column_name) [index_option]] [,...]
[PRIMARY KEY [index_name] (column_name,...)]
[CLUSTERED KEY [index_name] (column_name,...)]
[[CONSTRAINT [symbol]] FOREIGN KEY (fk_column_name) REFERENCES pk_table_name (pk_column_name)][,...]
[ANN INDEX [index_name] (column_name,...) [index_option]] [,...]
table_attribute:
DISTRIBUTED BY HASH(column_name,...) | DISTRIBUTED BY BROADCAST
partition_options:
PARTITION BY
{VALUE(column_name) | VALUE(date_format(column_name, 'format'))}
LIFECYCLE N
storage_policy:
STORAGE_POLICY= {'HOT'|'COLD'|'MIXED' {hot_partition_count=N}}
block_size:
BLOCK_SIZE= VALUE
engine:
ENGINE= 'XUANWU|XUANWU_V2'
Parameters
table_name, column_name, column_type, and COMMENT
Parameter | Description |
table_name | The name of the table. The table name must be 1 to 127 characters in length and can contain letters, digits, and underscores (_). The table name must start with a letter or an underscore (_). You can use the db_name.table_name format to specify the table that you want to create in a database. |
column_name | The name of the column that you want to add to the table. The column name must be 1 to 127 characters in length and can contain letters, digits, and underscores (_). The column name must start with a letter or an underscore (_). |
column_type | The data type of the column. For information about data types supported by AnalyticDB for MySQL, see Basic data types and Complex data types. |
COMMENT | The comment of the column or the table. |
column_attributes (DEFAULT | AUTO_INCREMENT)
DEFAULT {constant | CURRENT_TIMESTAMP}
Specifies the default value of a column. The default value can be a constant or the CURRENT_TIMESTAMP function. Other functions or variant expressions are not supported.
If no value is specified, the default value of the column is NULL
.
AUTO_INCREMENT
Specifies an auto-increment column. The data type of an auto-increment column must be BIGINT
.
AnalyticDB for MySQL assigns unique values to auto-increment columns. However, these values are not incremented in sequence and do not always start from 1.
column_constraints (NOT NULL | PRIMARY KEY)
NOT NULL
Specifies NOT NULL
columns, which cannot contain NULL
values. Columns specified as NULL
or not specified as NOT NULL
can contain NULL
values.
PRIMARY KEY
Specifies the primary key. You can specify only one column as the primary key by using column constraints. For example, you can specify the id column as the primary key by using id BIGINT NOT NULL PRIMARY KEY
. To specify multiple columns as the primary key, use a composite primary key in table_constraints.
table_constraints (Indexes)
AnalyticDB for MySQL supports a variety of indexes, including INDEX, PRIMARY KEY, CLUSTERED KEY, FOREIGN KEY, FULLTEXT INDEX, and ANN INDEX. A table can have one or more types of indexes.
INDEX | KEY
Specifies a regular index. INDEX and KEY can be used interchangeably.
By default, AnalyticDB for MySQL automatically creates indexes on all columns of the table. However, if you create an index on one or more columns when you create a table, such as an index on the id column by using INDEX (id)
, AnalyticDB for MySQL does not automatically generate indexes on other columns of the table.
PRIMARY KEY
Specifies the primary key index.
Overview
Each table can have only one primary key.
The primary key can consist of a single column, such as PRIMARY KEY (id)
, or multiple columns, such as PRIMARY KEY (id,name)
.
A composite primary key must include a distribution key and a partition key. We recommend that you place distribution and partition keys in the front section of a composite primary key.
Usage notes
You cannot perform the DELETE or UPDATE operation on tables that do not have primary keys.
If no primary key is specified, the following rules apply:
If no distribution key is specified, AnalyticDB for MySQL automatically adds the __adb_auto_id__
column to the table and uses the column as the primary key and the distribution key.
If a distribution key is specified, AnalyticDB for MySQL does not automatically add a primary key.
After a table is created, you cannot add, remove, or change primary key columns.
To ensure high performance, we recommend that you select one or a small number of numeric columns as the primary key.
Note If a table has excessive primary key columns, the following issues may occur:
Higher consumption of CPU and I/O resources. This is because AnalyticDB for MySQL checks whether duplicate primary key values exist when data is written.
Higher disk usage of primary key indexes. To view the disk usage of primary key indexes, use the Storage Analysis feature.
Slower partition rebuilding speed of BUILD jobs.
CLUSTERED KEY
Specifies a clustered index. A clustered index is configured at the partition level. It determines the physical order in which data is stored. Data in a partition is sorted and stored sequentially based on the values of the clustered index. Data records with the same or similar key values of the clustered index are stored in the same or adjacent data blocks. In range queries or equivalence filtering, using a clustered index can reduce disk I/O and speed up data reads. This is because when query conditions are the same as clustered index columns, the storage engine can read contiguous data blocks.
Applicable scenarios
A clustered index works well with range queries and equivalence filtering. A column that is frequently used in the conditions of range queries or equivalence filtering is an ideal clustered index column.
When query conditions match or partially match a clustered index column, read efficiency significantly improves. For example, you can use the user ID as the clustered index in software-as-a-service (SaaS) applications. This ensures that the records of a specific user ID are stored in the same or contiguous data blocks and improves data query performance.
Usage notes
Each table can have only one clustered index.
A clustered index can be created on a single column, such as CLUSTERED KEY index(id)
, or on multiple columns, such as CLUSTERED KEY index(id,name)
. When the clustered index involves two columns, data is first sorted based on the values of the first clustered index column. If the values of the first clustered index column are the same, data is sorted based on the values of the second clustered index column. Therefore, CLUSTERED KEY index(id,name)
and CLUSTERED KEY index(name,id)
are different clustered indexes.
To prevent reduced sorting performance, we recommend that you do not use a column that has lengthy values (such as strings larger than 10 KB) in a clustered index.
FULLTEXT INDEX | FULLTEXT KEY
Specifies a full-text index. FULLTEXT INDEX and FULLTEXT KEY can be used interchangeably. For more information about full-text indexes, see Create a full-text index.
Syntax and parameters
Syntax: [FULLTEXT [INDEX|KEY] [index_name] (column_name) [index_option]] [,...]
Parameters:
index_name: the name of the full-text index.
column_name: the name of the column on which you want to create a full-text index. The type of the column must be VARCHAR.
index_option: specifies the analyzer and custom dictionary used for the full-text index. This parameter is optional.
FOREIGN KEY
Specifies a foreign key index. Foreign key indexes are used to eliminate unnecessary joins. For more information, see Use primary and foreign key constraints to eliminate unnecessary joins.
Syntax and parameters
Supported versions:
Only AnalyticDB for MySQL clusters of V3.1.10 or later support the FOREIGN KEY clause.
Note To query the minor version of an AnalyticDB for MySQL Data Lakehouse Edition cluster, execute the SELECT adb_version();
statement. To update the minor version of a cluster, contact technical support.
Syntax: [[CONSTRAINT [symbol]] FOREIGN KEY (fk_column_name) REFERENCES pk_table_name (pk_column_name)][,...]
Parameters:
symbol: the name of the foreign key constraint. The name must be unique in a table. This parameter is optional. If you do not specify this parameter, the parser automatically uses the name of the foreign key column suffixed with _fk as the name of the foreign key constraint.
fk_column_name: the name of the foreign key column. The column must already exist.
pk_table_name: the name of the primary table. The primary table must already exist.
pk_column_name: the name of the foreign key constraint column, which is the primary key column of the primary table. The column must already exist.
Usage notes
Each table can have multiple foreign key indexes.
A foreign key index cannot consist of multiple columns, such as FOREIGN KEY (sr_item_sk, sr_ticket_number) REFERENCES store_sales(ss_item_sk,d_date_sk)
.
AnalyticDB for MySQL does not check data constraints. You must check the data constraint relationships between the primary key of the primary table and the foreign keys of the associated table.
You cannot add foreign key constraints to external tables.
ANN INDEX
Specifies a vector index. For information about vector search, see Vector search.
Syntax and parameters
Syntax: [ANN INDEX [index_name] (column_name,...) [index_option]] [,...]
Parameters:
index_name: the name of the vector index.
column_name: the name of the vector column. The vector column must be of one of the following types: ARRAY<FLOAT>, ARRAY<SMALLINT>, OR ARRAY<BYTE>. You must specify the vector dimension. For example, you can use the following syntax to create a four-dimensional vector column of the ARRAY<FLOAT> type named feature: feature array<float>(4)
.
index_option: the attribute of the vector index.
algorithm
: the algorithm that is used in the formula for calculating the distance between vectors. Set the value to HNSW_PQ, which is suitable for medium-sized datasets that are sensitive to vector dimensions and involve 1 million to 10 million records per table.
dis_function
: the formula that is used to calculate the distance between vectors. Set the value to SquaredL2. Calculation formula: (x1 - y1)^2 + (x2 - y2)^2 + ...
.
JSON INDEX
Specifies a JSON index or a JSON array index. For more information, see JSON indexes.
Syntax and parameters
JSON indexes
Supported versions:
For AnalyticDB for MySQL clusters of V3.1.5.10 or later, no JSON index is automatically created after you create a table. You must manually create JSON indexes.
For AnalyticDB for MySQL clusters earlier than V3.1.5.10, JSON indexes are automatically created for JSON columns after you create a table.
Note To query the minor version of an AnalyticDB for MySQL Data Lakehouse Edition cluster, execute the SELECT adb_version();
statement. To update the minor version of a cluster, contact technical support.
Syntax: [INDEX [index_name] (column_name|column_name->'$.json_path'.)]
Parameters:
JSON array indexes
Supported versions:
Only AnalyticDB for MySQL clusters of V3.1.10.6 or later support JSON array indexes.
Note To query the minor version of an AnalyticDB for MySQL Data Lakehouse Edition cluster, execute the SELECT adb_version();
statement. To update the minor version of a cluster, contact technical support.
Syntax: [INDEX [index_name] (column_name->'$[*]')]
Parameters:
index_name: the name of the index.
column_name->'$[*]': the name of the column on which you want to create the JSON array index. For example, vj->'$[*]' is used to create a JSON array index on the vj column.
table_attribute (Distribution key)
Determines whether a table is a standard or replicated table.
DISTRIBUTED BY HASH: specifies the table as a standard table. A standard table can make full use of the query capability of a distributed system to improve query efficiency. Each standard table can store up to hundreds of billions of data records.
DISTRIBUTED BY BROADCAST: specifies the table as a replicated table. A replicated table stores a data replica on each shard of the AnalyticDB for MySQL cluster to which the table belongs. We recommend that you store up to 20,000 rows in each replicated table.
DISTRIBUTED BY HASH (column_name,...)
Specifies the distribution key of a table. A table that has a distribution key is a standard table. AnalyticDB for MySQL calculates the hash values of the distribution key values and divides the table into shards based on the hash values. Sharding improves scalability and query performance.
Overview
Each table can have only one distribution key.
Each distribution key can contain one or more columns.
Distribution key columns must be included in the primary key columns. For example, if the distribution key is the customer_id column, the primary key must include the customer_id column.
Usage notes
If you do not specify a distribution key when you create a table, the following rules apply:
If the table has a primary key, AnalyticDB for MySQL uses the primary key as the distribution key.
If the table does not have a primary key, AnalyticDB for MySQL automatically adds the __adb_auto_id__
column to the table and uses the column as the primary key and the distribution key.
After a table is created, you cannot add, remove, or change distribution key columns. To change a distribution key, you must create a table that has the desired distribution key and migrate your data to the new table. For more information, see ALTER TABLE.
Recommendations
Select fewer columns to make the distribution key more suitable for a variety of complex queries.
Select columns whose values are evenly distributed as the distribution key, such as the transaction ID, device ID, user ID, or auto-increment columns. However, these columns are not ideal distribution keys when query conditions are limited to a small number of column values. For example, if Column A has evenly distributed values but the query condition is always A=3, setting Column A as the distribution key causes data hotspot.
Select columns that can be used to join tables as the distribution key. This way, data with the same distribution key values in two joined tables is distributed to the same shard. Join operations are performed on the same shard without the need of data transmission among shards. This minimizes data redistribution and improves query performance. For example, if your use scenario involves querying the order history of customers, you can specify the customer_id
column as the distribution key.
Do not select columns of the date, time, and timestamp types as the distribution key. The preceding columns are prone to data skew during data writes and may degrade write performance. Most queries are limited to a period of time such as a day or a month. In this case, the data that you want to query may exist on only one node, and queries cannot leverage the processing capabilities of all nodes in a distributed database system. We recommend that you select columns of the date and time types as the partition key. For more information, see the "partition_options (Partition key and lifecycle)" section of this topic.
You can use the data modeling diagnostics feature to check whether a column is a suitable distribution key and whether data skew occurs. For more information, see Storage diagnostics.
DISTRIBUTED BY BROADCAST
Specifies a replicated table. A replicated table stores a data replica on each shard of the AnalyticDB for MySQL cluster to which the table belongs. We recommend that you do not store a large amount of data in each replicated table.
Advantages: When you perform a JOIN query, you do not need to transmit the data of a replicated table across different nodes. This significantly reduces network transmission overheads and improves cluster stability in high-concurrency scenarios.
Disadvantages: When data is changed after you perform the INSERT, UPDATE, and DELETE operations in a replicated table, these changes are broadcast to all nodes of the cluster to ensure data consistency. However, this affects the overall write performance. Therefore, we recommend that you do not frequently create, delete, or modify replicated tables.
partition_options (Partition key and lifecycle)
If a single shard contains a large amount of data after you specify a distribution key for a table, you can divide the shard into partitions to speed up data filtering and improve query performance.
Benefits
Partitioning speeds up data filtering and improves query performance due to the following reasons:
Increases query speed by using the partition pruning feature. The partition pruning feature allows the system to scan only the partitions that contain data relevant to the query. This improves the query speed.
Improves index scan performance. If a table that involves an excessive number of rows, such as 50 million, is not partitioned, the index scan efficiency is low. If the table is partitioned, an index is created for each partition. This leads to more efficient index scans.
Enhances the efficiency of BUILD jobs. You can use BUILD jobs to convert the data written in real time to historical data. During the process, the system creates partitions and indexes, and clears redundant data. Indexes take effect only after BUILD jobs are complete. If a table is not partitioned, the entire table is scanned for each BUILD job. The more records a table contains, the longer the process takes and the later a new index takes effect. This affects the query performance. If a table is partitioned, only partitions that have data changes are scanned for each BUILD job. This enhances the efficiency of BUILD jobs.
Partitioning facilitates data lifecycle management.
Partitioning helps you implement tiered storage of hot and cold data based on different storage policies.
PARTITION BY
Specifies a partition key.
Syntax: PARTITION BY VALUE {(column_name) | (DATE_FORMAT(column_name, 'format'))} LIFECYCLE n
Parameters:
column_name: the name of the partition key. In the PARTITION BY VALUE(column_name)
syntax, partitioning is based on the values of the column_name
column. The partition key can be one of the following types: numeric, datetime, or a string that specifies a number.
DATE_FORMAT(column_name, 'format')
: converts a datetime column to a specified date format and partitions the data. The specified date format can only contain year, month, or day. The following date formats are supported: %Y, %y, %Y%m, %y%m, %Y%m%d, and %y%m%d. You can change the format after a table is created. For more information, see ALTER TABLE.
Usage notes
For AnalyticDB for MySQL clusters earlier than V3.2.1.0, you must use the LIFECYCLE n
parameter to specify the lifecycle of partitions when you use the PARTITION BY
clause to specify a partition key. Otherwise, an error is returned.
For AnalyticDB for MySQL clusters of V3.2.1.0 or later, the LIFECYCLE n
parameter is optional when you use the PARTITION BY
clause to specify a partition key. If you do not specify the LIFECYCLE n parameter, partition data is not deleted.
After a table is created, you cannot add partition keys, or add, remove, or change partition key columns. To add or modify a partition key, create a table that has the desired partition key and migrate data to the new table. For more information, see ALTER TABLE.
Recommendations
We recommend that you use a datetime column as the partition key.
A partition that is excessively large or small can affect read and write performance, and may even affect cluster stability. For information about the recommended partition size and the criteria for partition field reasonability, see the "Partitioned table diagnostics" section of the Storage diagnostics topic.
We recommend that you do not frequently update data in historical partitions. If you frequently update data in historical partitions, you may need to change the partition key.
LIFECYCLE n
The LIFECYCLE n parameter must be used together with the PARTITION BY
clause. You can use this parameter to manage the lifecycle of partitions. AnalyticDB for MySQL sorts partitions in descending order based on partition key values. The first n partitions are retained and the other partitions are deleted.
For AnalyticDB for MySQL clusters earlier than V3.2.1.1, the LIFECYCLE n
parameter specifies that up to n partitions can be retained on each shard. The partition lifecycle is managed at the shard level. However, if data is unevenly distributed or the amount of data is excessively small, more than n partitions may be retained on specific shards.
For AnalyticDB for MySQL clusters of V3.2.1.1 or later, the partition lifecycle is managed at the table level. The LIFECYCLE n
parameter specifies that up to n partitions can be retained in each table. However, for tables that are created before AnalyticDB for MySQL clusters are updated to V3.2.1.1 or later, the partition lifecycle is managed at the shard level. The LIFECYCLE n
parameter specifies that up to n partitions can be retained on each shard.
Example
PARTITION BY VALUE (DATE_FORMAT(date, '%Y%m%d')) LIFECYCLE 30
specifies that during partitioning, data in the date column is converted into the yyyyMMdd format and up to 30 partitions are retained. Assume that data from the 1st to the 30th day is written to corresponding partitions from Partition 20231201 to Partition 20231230. On the 31st day when data is written to Partition 20231231, the partition with the smallest partition key value (in this case, Partition 20231201) is automatically deleted because only up to 30 partitions can be retained.
STORAGE_POLICY (Storage policy)
Data Lakehouse Edition and Data Warehouse Edition in elastic mode for Cluster Edition allow you to specify a data storage policy. Storage policies vary in read/write performance and storage costs.
Valid values:
hot (default): hot storage. Data in all partitions of the entire table is stored on SSDs. Hot storage has the best read/write performance, but incurs the highest storage cost.
cold: cold storage. Data in all partitions of the entire table is stored on Object Storage Service (OSS). Compared with hot storage, cold storage has lower read/write performance, but is the least costly option.
mixed: a combination of hot storage and cold storage, also called tiered storage. This policy reduces storage costs and ensures query performance by storing frequently accessed data (hot data) on SSDs and infrequently accessed data (cold data) on OSS. If you set the STORAGE_POLICY parameter to mixed, you must use the PARTITION BY
clause to specify a partition key and the hot_partition_count
parameter to specify the number of hot partitions. If you do not specify a partition key, tiered storage does not take effect and data is stored on SSDs.
Example for tiered storage
hot_partition_count (Hot partitions)
If you set the STORAGE_POLICY
parameter to mixed, you must use hot_partition_count=n
(where n is a positive integer) to specify the number of hot partitions. AnalyticDB for MySQL sorts records in descending order based on the partition key values. The first n partitions are hot partitions and the other partitions are cold partitions.
Note If you set the STORAGE_POLICY
parameter to a value other than mixed and use hot_partition_count=n
, an error occurs.
block_size (Data blocks)
A data block is the smallest I/O unit for reading and writing data. The BLOCK_SIZE parameter specifies the number of rows stored in each data block in column-oriented storage. This parameter determines the number of rows read in each I/O operation and affects query performance based on query characteristics. For example, if BLOCK_SIZE is set to a large value for point queries, blocks are inefficiently read by the storage system. In this case, you can appropriately decrease the value of BLOCK_SIZE.
Usage notes:
Default value for a replicated table: 4096.
Default value for an AnalyticDB for MySQL Data Warehouse Edition cluster in elastic mode for Standalone Edition that has fewer than 32 cores: 8192.
Default value in other cases: 32760. If the default value of BLOCK_SIZE is 32760, BLOCK_SIZE is not displayed when you execute the SHOW CREATE TABLE
statement.
Important If you are not familiar with column-oriented storage, we recommend that you do not change the value of BLOCK_SIZE.
ENGINE (Storage engine)
Specifies the storage engine type of AnalyticDB for MySQL internal tables. You can use the storage engine for historical data analysis.
Valid values:
XUANWU (default): the XUANWU storage engine. If you do not specify the ENGINE parameter when you create a table, this value is used.
XUANWU_V2: the XUANWU_V2 storage engine. It is the next-generation storage engine that is developed based on XUANWU. Only AnalyticDB for MySQL clusters of V3.2.0 or later support the XUANWU_V2 storage engine. To use the XUANWU_V2 storage engine, you must set the ENGINE parameter to XUANWU_V2. For more information, see XUANWU_V2 engine.
Note For AnalyticDB for MySQL clusters earlier than V3.1.9.5, if you set the ENGINE
parameter to XUANWU when you create an internal table, you must set the table_properties
parameter to {"format":"columnstore"}. Otherwise, the table fails to be created.
For more information about the XUANWU storage engine, see XUANWU analytical storage engine.
AS query_expr (CTAS)
CREATE TABLE AS query_expr can be used to create a table and write the queried data to a new table. For more information, see CREATE TABLE AS SELECT (CTAS).
Examples
Create a partitioned table and configure the partition lifecycle
Create a standard table named customer. Specify login_time
, customer_id
, and phone_num
as the composite primary key, customer_id
as the distribution key, and login_time
as the partition key. Set the partition lifecycle to 30.
All partitions are sorted in descending order based on the values of the login_time
partition key. Only the first 30 partitions are retained. When data is written to the 31st partition, the partition with the smallest partition key value is automatically deleted.
Assume that data from the 1st (with login_time value 20231201) to the 30th (with login_time value 20231230) day is written to corresponding partitions from Partition 20231201 to Partition 20231230. When data with login_time value 20231231 is written to the database on the 31st day, the partition with the smallest login_time value (Partition 20231201) is automatically deleted. This way, only data within the last 30 days is retained.
CREATE TABLE customer (
customer_id BIGINT NOT NULL COMMENT 'Customer ID',
customer_name VARCHAR NOT NULL COMMENT 'Customer name',
phone_num BIGINT NOT NULL COMMENT 'Phone number',
city_name VARCHAR NOT NULL COMMENT 'City',
sex INT NOT NULL COMMENT 'Gender',
id_number VARCHAR NOT NULL COMMENT 'ID card number',
home_address VARCHAR NOT NULL COMMENT 'Home address',
office_address VARCHAR NOT NULL COMMENT 'Office address',
age INT NOT NULL COMMENT 'Age',
login_time TIMESTAMP NOT NULL COMMENT 'Logon time',
PRIMARY KEY (login_time,customer_id,phone_num)
)
DISTRIBUTED BY HASH(customer_id)
PARTITION BY VALUE(DATE_FORMAT(login_time, '%Y%m%d')) LIFECYCLE 30
COMMENT 'Customer information table';
Create a table without a distribution key
Create a table with a primary key
If you create a table that has a primary key but not a distribution key, AnalyticDB for MySQL automatically uses the primary key as the distribution key.
CREATE TABLE orders (
order_id BIGINT NOT NULL COMMENT 'Order ID',
customer_id INT NOT NULL COMMENT 'Customer ID',
order_status VARCHAR(1) NOT NULL COMMENT 'Order status',
total_price DECIMAL (15, 2) NOT NULL COMMENT 'Price',
order_date DATE NOT NULL COMMENT 'Order date',
PRIMARY KEY(order_id,order_date)
);
Query the statement that is used to create the table and verify that the primary key columns order_id and order_date are used as the distribution key.
SHOW CREATE TABLE orders;
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| orders | CREATE TABLE `orders` ( |
| | 'order_id' bigint NOT NULL COMMENT 'Order ID', |
| | 'customer_id' int NOT NULL COMMENT 'Customer ID', |
| | 'order_status' varchar(1) NOT NULL COMMENT 'Order status', |
| | 'total_price' decimal(15, 2) NOT NULL COMMENT 'Price', |
| | 'order_date' date NOT NULL COMMENT 'Order date', |
| | PRIMARY KEY (`order_id`,`order_date`) |
| | ) DISTRIBUTED BY HASH(`order_id`,`order_date`) INDEX_ALL='Y' STORAGE_POLICY='HOT' ENGINE='XUANWU' TABLE_PROPERTIES='{"format":"columnstore"}' |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.04 sec)
Create a table without a primary key
If you create a table that does not have a primary or distribution key. AnalyticDB for MySQL adds the __adb_auto_id__
column to the table and uses the column as the primary key and the distribution key.
CREATE TABLE orders_new (
order_id BIGINT NOT NULL COMMENT 'Order ID',
customer_id INT NOT NULL COMMENT 'Customer ID',
order_status VARCHAR(1) NOT NULL COMMENT 'Order status',
total_price DECIMAL (15, 2) NOT NULL COMMENT 'Price',
order_date DATE NOT NULL COMMENT 'Order date'
);
Query the statement that is used to create the table and verify that the auto-increment column named __adb_auto_id__ is automatically added to the table and used as the primary key and the distribution key.
SHOW CREATE TABLE orders_new;
+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| orders_new | CREATE TABLE `orders_new` ( |
| | `__adb_auto_id__` bigint AUTO_INCREMENT, |
| | 'order_id' bigint NOT NULL COMMENT 'Order ID', |
| | 'customer_id' int NOT NULL COMMENT 'Customer ID', |
| | 'order_status' varchar(1) NOT NULL COMMENT 'Order status', |
| | 'total_price' decimal(15, 2) NOT NULL COMMENT 'Price', |
| | 'order_date' date NOT NULL COMMENT 'Order date', |
| | PRIMARY KEY (`__adb_auto_id__`) |
| | ) DISTRIBUTED BY HASH(`__adb_auto_id__`) INDEX_ALL='Y' STORAGE_POLICY='HOT' ENGINE='XUANWU' TABLE_PROPERTIES='{"format":"columnstore"}' |
+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.04 sec)
Create a table without a partition key
Create a table named supplier that uses the supplier_id
auto-increment column as the distribution key and is sharded based on the hash values of supplier_id
values.
CREATE TABLE supplier (
supplier_id BIGINT AUTO_INCREMENT PRIMARY KEY,
supplier_name VARCHAR,
address INT,
phone VARCHAR
)
DISTRIBUTED BY HASH(supplier_id);
Specify storage policies
Specify a cold storage policy
CREATE TABLE item (
order_id BIGINT NOT NULL,
item_id INT NOT NULL,
quantity DECIMAL(15, 2) NOT NULL,
discount DECIMAL(15, 2) NOT NULL,
shipdate DATE NOT NULL,
PRIMARY KEY (order_id,item_id,shipdate)
)
DISTRIBUTED BY HASH(item_id)
PARTITION BY VALUE(date_format(shipdate, '%Y%m')) LIFECYCLE 200
STORAGE_POLICY='COLD';
Specify a hot storage policy
CREATE TABLE item (
order_id BIGINT NOT NULL,
item_id INT NOT NULL,
quantity DECIMAL(15, 2) NOT NULL,
discount DECIMAL(15, 2) NOT NULL,
shipdate DECIMAL NOT NULL,
PRIMARY KEY (order_id,item_id,shipdate)
)
DISTRIBUTED BY HASH(item_id)
PARTITION BY VALUE(date_format(shipdate, '%Y%m')) LIFECYCLE 200
STORAGE_POLICY='HOT';
Specify a tiered storage policy and set the number of hot partitions to 16
CREATE TABLE item (
order_id BIGINT NOT NULL,
item_id INT NOT NULL,
quantity DECIMAL(15, 2) NOT NULL,
discount DECIMAL(15, 2) NOT NULL,
shipdate DATE NOT NULL,
PRIMARY KEY (order_id,item_id,shipdate)
)
DISTRIBUTED BY HASH(item_id)
PARTITION BY VALUE(date_format(shipdate, '%Y%m')) LIFECYCLE 200
STORAGE_POLICY='MIXED' HOT_PARTITION_COUNT=16;
Specify a full-text index
Create a full-text index named fidx_c on the content column.
CREATE TABLE fulltext_tb (
id INT,
content VARCHAR,
keyword VARCHAR,
FULLTEXT INDEX fidx_c(content),
PRIMARY KEY (id)
)
DISTRIBUTED BY HASH(id);
For information about how to create and change a full-text index, see Create a full-text index.
For information about full-text search, see Full-text query.
Specify a vector index
Create a table that has a four-dimensional vector column of the ARRAY<SMALLINT> type named short_feature and a four-dimensional vector column of the ARRAY<FLOAT> type named float_feature.
Create vector indexes named short_feature_index and float_feature_index based on the vector columns for the table.
CREATE TABLE fact_tb (
xid BIGINT NOT NULL,
cid BIGINT NOT NULL,
uid VARCHAR NOT NULL,
vid VARCHAR NOT NULL,
wid VARCHAR NOT NULL,
short_feature array<smallint>(4),
float_feature array<float>(4),
ann index short_feature_index(short_feature),
ann index float_feature_index(float_feature),
PRIMARY KEY (xid, cid, vid)
)
DISTRIBUTED BY HASH(xid) PARTITION BY VALUE(cid) LIFECYCLE 4;
For information about vector indexes and vector search, see Vector search.
Specify a foreign key index
Create a table named store_returns
. Use the FOREIGN KEY
clause to associate the sr_item_sk
column of the store_returns table with the primary key column customer_id
of the customer
table.
CREATE TABLE store_returns (
sr_sale_id BIGINT NOT NULL PRIMARY KEY,
sr_store_sk BIGINT,
sr_item_sk BIGINT NOT NULL,
FOREIGN KEY (sr_item_sk) REFERENCES customer (customer_id)
);
Specify a JSON array index
Create a table and create a JSON array index named idx_vj on the vj column.
CREATE TABLE json(
id INT,
vj JSON,
INDEX idx_vj(vj->'$[*]')
)
DISTRIBUTED BY HASH(id);
For information about how to create and change JSON array indexes, see the "Create a JSON array index" section of the JSON indexes topic and the "JSON array indexes" section of the ALTER TABLE topic.
FAQ
Column attributes and constraints
Does an auto-increment column always start from 1? Are all values unique?
The values of an auto-increment column are not incremented in sequence and do not always start from 1. However, all values of an auto-increment column are unique.
Distribution key, partition key, and lifecycle
What is the difference between a distribution key and a partition key?
A distribution key is used in sharding. Data in a table is distributed across different shards based on the hash values of the distribution key values. A partition key is used in partitioning. Within a shard, data is further distributed across different partitions based on the values of the partition key. The following figure shows how sharding and partitioning work.
Do I have to specify a distribution key when I create a table?
When you create a partitioned table, you do not have to specify a distribution key. If you specify a primary key but not a distribution key, AnalyticDB for MySQL automatically uses the primary key as the distribution key. If you do not specify a primary key or a distribution key, AnalyticDB for MySQL automatically adds the __adb_auto_id__
column to the table and uses the column as the distribution key and the primary key.
When you create a replicated table, you do not need to specify a distribution key. However, you must use DISTRIBUTED BY BROADCAST to specify that each storage node of the AnalyticDB for MySQL cluster stores a full copy of data.
Is the number of shards affected if I change cluster specifications?
No, the number of shards is not affected by changes to cluster specifications.
How do I query the partition information of a table?
You can execute the following statement to query the partition information of a table:
SELECT partition_id, --The name of the partition.
row_count, -- The total number of rows in the partition.
local_data_size, -- The local data storage of the partition.
index_size, -- The size of the partition index.
pk_size, -- The size of the primary key index of the partition.
remote_data_size -- The remote data storage of the partition.
FROM information_schema.kepler_partitions
WHERE schema_name = '$DB'
AND table_name ='$TABLE'
AND partition_id > 0;
Why am I unable to view the partition information after I create a partitioned table?
You cannot view the partition information after you create a partitioned table due to the following reasons:
No data is written to the table. When you create the table, you only configure a partitioning rule by specifying a partition key. Partitioning is performed based on the values of the partition key. If no data is written to the table, the partition key values are empty and no partition is created.
No BUILD job for partitions is complete. Partitions are not created in real time. You can view the partition information only after a BUILD job of the table is complete.
Solution:
Write data to the table and wait for the BUILD job to complete.
Note For information about BUILD jobs and how to query the status of a BUILD job, see BUILD.
What data types are supported for a partition key?
Partition keys can be of the following data types: numeric, datetime, or a string that specifies a number. Other data types may cause data write errors.
The following error message indicates that written partition key values do not meet the data type requirements: partition format function error
.
Can I use a function such as PARTITION BY VALUE(FROM_UNIXTIME(col, 'format')) to specify a partition key?
No, you cannot specify a partition key by using other functions. You can specify a partition key only by using one of the following functions: PARTITION BY VALUE(column_name) and PARTITION BY VALUE(date_format(column_name, 'format')). If you use any other function, an error occurs.
How do I query the partition lifecycle?
You can execute the SHOW CREATE TABLE <table_name>
statement to view the partition lifecycle. The partition lifecycle is displayed in the returned results.
Why can I still query data that is stored more than 30 days after I configure data to be retained for only 30 days by setting the value of LIFECYCLE to 30?
You can query data that is stored more than 30 days due to the following reasons:
Specific partitions have just expired and have not been deleted. An expired partition is not deleted until the BUILD job for the table is complete.
For AnalyticDB for MySQL cluster earlier than V3.2.1.1 that uses shard-level partition lifecycle management, a shard in a table contains fewer partitions than the number specified by the LIFECYCLE n parameter. This issue does not occur on tables that are created in AnalyticDB for MySQL clusters of V3.2.1.1 or later.
Root causes:
Data is not consistently written to the same shard. Assume that data is partitioned by date. Shard 1 contains Partition 20231201 to 20231230, and Shard 2 contains Partition 20231202 to 20231231. Partitions in both shards are retained because both shards have 30 partitions, which does not exceed the value 30 specified by the LIFECYCLE n parameter. Therefore, you can query data in Partition 20231201 to 20231231.
No new data is written to the table for a long time. Assume that data is partitioned by date. Shard 1 contains Partitions 20231201, 20231202, 20231203, and 20231204. No new data is written to the partitions. In this case, Shard 1 has only four partitions, which does not exceed the value 30 specified by the LIFECYCLE n parameter. Therefore, no partitions are deleted and you can still query data in Partition 20231201.
Is data in expired partitions immediately deleted?
No, partitions are not created or deleted in real time. An expired partition is not deleted until the BUILD job for the table is complete.
Indexes
How do I query a clustered index of a table?
You can execute the SHOW CREATE TABLE
statement to query the clustered index specified in a table.
Does AnalyticDB for MySQL support unique indexes?
No, AnalyticDB for MySQL does not support unique indexes. However, the primary key index of a table in AnalyticDB for MySQL is a unique index and ensures that the values of the primary key are unique.
Column-oriented storage
What does the TABLE_PROPERTIES='{"format":"columnstore"}' syntax mean in the table creation statement?
TABLE_PROPERTIES='{"format":"columnstore"}'
specifies that the storage engine uses column-oriented storage. Do not modify the syntax when you create a table.
Others
What can I modify by using the ALTER TABLE statement after I create a table?
You can execute the ALTER TABLE statement to make the following changes:
Modify the following parameters: table_name, column_name, column_type, and COMMENT.
Add and remove columns except primary key columns.
Change default column values.
Change the NOT NULL column constraint to NULL.
Create and delete indexes.
Change the date format of a partition function.
Change the partition lifecycle.
Change the storage policy.
Other changes cannot be made after a table is created. For more information, see ALTER TABLE.
How many tables can I create in each cluster?
The following limits apply to the maximum number of tables that can be created for each AnalyticDB for MySQL cluster:
Maximum number of tables that can be created for each Data Warehouse Edition cluster in reserved mode that has 1 to 20 node groups: 80000/(Number of shards/Number of node groups)
. In the formula, the result of [Number of shards/Number of node groups]
must be rounded up.
You can add more node groups to increase the maximum number of internal tables that can be created. For more information, see Scale a Data Warehouse Edition cluster.
Maximum number of internal tables that can be created for each Data Warehouse Edition cluster in elastic mode: [80000/(Number of shards/Number of EIUs)] × 2
. In the formula, the result of [Number of shards/Number of EIUs]
must be rounded up.
You can scale out elastic I/O units (EIUs) to increase the maximum number of internal tables that can be created. For more information, see Scale out elastic I/O resources.
Maximum number of internal tables that can be created for each Data Lakehouse Edition cluster: [80000/(Number of shards/Amount of reserved storage resources divided by 24 ACUs)] × 2
.
You can scale up reserved storage resources to increase the maximum number of internal tables that can be created. For more information, see Scale a Data Lakehouse Edition cluster.
Maximum number of external tables that can be created for each Data Lakehouse Edition cluster or Data Warehouse Edition cluster in elastic mode: 500,000.
Note To query the number of shards, execute the SELECT COUNT(1) FROM information_schema.kepler_meta_shards;
statement. You cannot increase or decrease the number of shards. For information about sharding, see the "shard" section of the Terms topic.
What is the default character set for AnalyticDB for MySQL?
AnalyticDB for MySQL uses UTF-8 as the default character set, which is equivalent to the utf8mb4 character set for MySQL. Other character sets are not supported.
Common errors and solutions
partition number must larger than 0
Cause: You specified a partition key but not the partition lifecycle.
Sample statement:
CREATE TABLE test (
id INT COMMENT '',
name VARCHAR(10) COMMENT '',
PRIMARY KEY (id, name)
)
DISTRIBUTED BY HASH(id) PARTITION BY VALUE(name);
Solution: Specify the partition lifecycle in the table creation statement. Sample statement:
CREATE TABLE test (
id INT COMMENT '',
name VARCHAR(10) COMMENT '',
PRIMARY KEY (id, name)
)
DISTRIBUTED BY HASH(id) PARTITION BY VALUE(name) LIFECYCLE 30;
Note This error occurs only on AnalyticDB for MySQL clusters earlier than V3.2.1.0.
Only 204800 partition allowed, the number of existing partition=>196462
Cause: The number of partitions in the cluster exceeds the upper limit of 102,400 in AnalyticDB for MySQL.
Execute the following statement to query the number of partitions in the cluster:
SELECT count(partition_id)
FROM information_schema.kepler_partitions
WHERE partition_id > 0;
Solution: Increase the partition granularity, such as changing the granularity from day to month. For information about how to change the partition granularity, see the "Change the partition function format of a table" section of the ALTER TABLE topic.
partition column 'XXX' is not found in primary index=> [YYY]
Cause: The primary key does not contain the distribution key or the partition key.
Sample statement:
CREATE TABLE test (
id INT COMMENT '',
name VARCHAR(10) COMMENT '',
PRIMARY KEY (id)
)
DISTRIBUTED BY HASH(id) PARTITION BY VALUE(name) LIFECYCLE 30;
This error also occurs if you do not specify the primary key or the distribution key. If you do not specify the primary key or the distribution key when you create a table, AnalyticDB for MySQL automatically adds the __adb_auto_id__
column to the table and uses the column as the primary key and the distribution key. In this case, the primary key contains only the __adb_auto_id__
column, but not the partition key. Therefore, this error occurs.
Sample statement:
CREATE TABLE test (
id INT COMMENT '',
name VARCHAR(10) COMMENT ''
)
PARTITION BY VALUE(name) LIFECYCLE 30;
Solution: Include the partition key in the primary key.
SemanticException:only 5000 table allowed
Cause: The number of tables in the cluster exceeds the upper limit in AnalyticDB for MySQL. The maximum number of tables that can be created for each cluster varies based on the cluster edition and specifications. For information about the limits on the maximum number of tables that can be created for different types of clusters, see Limits.
Solutions:
unsigned expr not supported
Cause: AnalyticDB for MySQL does not support the UNSIGNED attribute because it does not support unsigned numbers.
Solution: Do not specify the UNSIGNED attribute for a column in the table creation statement. Instead, you must implement the non-negative value constraint in your business code.
References
For information about how to write data to tables, see INSERT INTO.
For information about how to copy records from one table to another, and clear existing data in a partition and then batch write data to the partition, see INSERT SELECT FROM and INSERT OVERWRITE SELECT.
For information about how to import data from data sources, such as ApsaraDB RDS, MaxCompute, and OSS, into AnalyticDB for MySQL, see Data import.