All Products
Search
Document Center

Tablestore:Create a data table

Last Updated:Aug 16, 2024

This topic describes how to create a data table by calling the CreateTable operation. When you call the CreateTable operation to create a data table, you must specify schema information and configuration information for the data table. If the data table belongs to a high-performance instance, you can configure the reserved read throughput and reserved write throughout based on your business requirements. You can create one or more index tables when you create a data table.

Usage notes

  • It takes several seconds to load a data table after the data table is created. During this period, all read and write operations on the data table fail. Perform operations on the data table after the data table is loaded.

  • You must specify the primary key when you create a data table. A primary key consists of one to four primary key columns. Specify a name and data type for each primary key column.

  • Tablestore provides the auto-increment primary key column feature. This feature is suitable for system design scenarios that require an auto-increment primary key column, such as item IDs on e-commerce websites, user IDs on large websites, post IDs in forums, and message IDs in chat tools. For more information, see Configure an auto-increment primary key column.

Prerequisites

Syntax

"""
Description: You can call this operation to create a data table based on the specified schema information. 

table_meta is an instance of the tablestore.metadata.TableMeta class. table_meta specifies the name of the data table and the schema of the primary key. 
For more information, see the documentation of the TableMeta class. After you create a data table, the partitions are loaded after several seconds. You can perform operations on the data table only after the partitions are loaded. 
table_options is an instance of the tablestore.metadata.TableOptions class. table_options contains the time_to_live, max_version, and max_time_deviation parameters. 
reserved_throughput is an instance of the tablestore.metadata.ReservedThroughput class. reserved_throughput specifies the reserved read throughput and reserved write throughput. 
secondary_indexes is an array that can contain one or more instances of the tablestore.metadata.SecondaryIndexMeta class. secondary_indexes specifies the global secondary index that you want to create. 

Return value: none. 
"""

def create_table(self, table_meta, table_options, reserved_throughput, secondary_indexes=[]):

Parameters

Configure the parameters in the code based on the parameter description in the following table and the "Request syntax" section of the CreateTable topic.

Parameter

Description

table_meta

The schema information about the data table. The schema information includes the following parameters:

  • table_name: the name of the data table.

  • schema_of_primary_key: the schema of the primary key. For more information, see Primary keys and attributes.

    Note

    You do not need to specify the schema for attribute columns. Different rows in a Tablestore table can have different attribute columns. You can specify the names of attribute columns when you write data to a data table.

    • The primary key of a data table consists of one to four primary key columns. Primary key columns are sorted in the order in which they are added. For example, PRIMARY KEY (A, B, C) and PRIMARY KEY (A, C, B) have different schemas. Tablestore sorts rows based on the values of all primary key columns.

    • The first primary key column is the partition key. Data that has the same partition key is stored in the same partition. We recommend that you keep the size of data with the same partition key less than or equal to 10 GB. Otherwise, a single partition may be too large to split. We also recommend that you evenly distribute read/write operations among different partition keys to facilitate load balancing.

  • defined_columns: the predefined columns of the data table and the data types of the predefined column values. Primary key columns cannot be specified as predefined columns. You can use predefined columns as the index columns or attribute columns of index tables.

table_options

The configuration information about the data table. For more information, see Data versions and TTL.

The configuration information includes the following parameters:

  • time_to_live: the retention period of data in the table. This period is the validity period of data. Tablestore automatically deletes data when the data retention period reaches the value of time_to_live.

    The minimum time_to_live value is 86400, which is equal to one day. A value of -1 indicates that the data never expires.

    If you set the timeToLive parameter to -1 for the data table, the data in the data table never expires. After the data table is created, you can call the UpdateTable operation to modify the value of the timeToLive parameter.

    Unit: seconds.

    Important

    If you want to create an index table for the data table, the timeToLive parameter must meet one of the following requirements:

    • The TimeToAlive parameter of the data table is set to -1, which means that data in the data table never expires.

    • The timeToLive parameter of the data table is set to a value other than -1 and update operations on the data table are prohibited.

  • max_version: the maximum number of data versions that can be retained for a single attribute column. If the number of data versions in an attribute column exceeds the value of this parameter, the system deletes data of earlier versions.

    When you create a data table, you can set this parameter based on your business requirements. After the data table is created, you can call the UpdateTable operation to modify the value of the maxVersions parameter.

    Important

    If you want to create an index table for the data table, you must set the maxVersions parameter to 1.

  • max_time_deviation: the max version offset, which is the maximum difference between the current system time and the timestamp of the written data. The difference between the version number and the time at which the data is written must be less than or equal to the value of the max_time_deviation parameter. Otherwise, an error occurs when the data is written.

    The valid version range of data in an attribute column is calculated by using the following formula: Valid version range = [max{Data written time - Max Version Offset, Data written time - TTL value}, Data written time + Max Version Offset).

    When you create a data table, Tablestore uses the default value of 86400 if you do not specify a max version offset. After the data table is created, you can call the UpdateTable operation to modify the value of the maxTimeDeviation parameter.

    Unit: seconds.

  • allow_update: specifies whether to allow the UpdateRow operation. The default value is true, which indicates that the UpdateRow operation is allowed. If you set allow_update to false, the UpdateRow operation is prohibited.

    Important

    If you want to use the lifecycle feature of a search index, you must set this parameter to false to prohibit data writes by UpdateRow.

You can call the UpdateTable operation to modify the time_to_live and max_version parameters of a data table. For more information, see UpdateTable.

reserved_throughput

The reserved read throughput and reserved write throughput of the data table.

You can set the reserved read throughput and reserved write throughput only to 0 for data tables in capacity instances. Reserved throughputs do not apply to these instances.

The default value 0 indicates that you are charged for all throughput on a pay-as-you-go basis.

Unit: CU.

  • If you set the reserved read throughput and reserved write throughout to a value that is greater than 0 for a data table, Tablestore reserves related resources for the data table. After you create the data table, you are charged for the reserved throughput resources. You are charged for additional throughput on a pay-as-you-go basis. For more information, see Billing overview.

  • If you set the reserved read throughput and reserved write throughout to 0, Tablestore does not reserve related resources for the data table.

secondary_indexes

The schema information about the index table. The schema information includes the following parameters:

  • index_name: the name of the index table.

  • primary_key_names: the index key columns of the index table. The index key columns are a combination of primary key columns and predefined columns of the data table.

    If you use the local secondary index feature, the first primary key column of an index table must be the same as the first primary key column of the data table.

  • defined_column_names: the indexed attribute columns. The attribute columns are a combination of predefined columns of the data table.

  • index_type: the type of the index. Valid values: IT_GLOBAL_INDEX and IT_LOCAL_INDEX.

    • If index_type is not specified or set to IT_GLOBAL_INDEX, the global secondary index feature is used.

      Tablestore automatically synchronizes data from the indexed columns and primary key columns of the data table to the columns of the index table that you want to create in asynchronous mode. The synchronization latency is within a few milliseconds.

    • If index_type is set to IT_LOCAL_INDEX, the local secondary index feature is used.

      Tablestore automatically synchronizes the data from the indexed columns and primary key columns of the data table to the columns of the index table in synchronous mode. You can query the data from the index table immediately after the data is written to the data table.

Examples

Create a data table without creating an index table for the data table

The following sample code provides an example on how to create a data table that contains two primary key columns. In this example, the time_to_live parameter is set to 31536000 (one year), the max_version parameter is set to 3, the max_time_deviation parameter is set to 86400 (one day), and the reserved_throughput parameter is set to (0,0).

# Create a schema for the primary key columns of the data table, including the number, names, and types of the primary key columns. 
# The first primary key column is named pk0 and requires an INTEGER value. The first primary key column is also the partition key. 
# The second primary key column is named pk1 and requires an INTEGER value. In this example, the data type is set to INTEGER. You can also set the data type to STRING or BINARY. 
schema_of_primary_key = [('pk0', 'INTEGER'), ('pk1', 'INTEGER')]

# Create a tableMeta instance based on the name of the data table and the schema of the primary key columns. 
table_meta = TableMeta('<table_name>', schema_of_primary_key)

# Create a TableOptions instance. Set the time_to_live parameter to 31536000 to automatically delete expired data. Then, set the max_version parameter to 3 and the max_time_deviation parameter to 86400 (one day). 
table_options = TableOptions(31536000, 3, 86400)

# Set the reserved read throughput and reserved write throughput to 0. 
reserved_throughput = ReservedThroughput(CapacityUnit(0, 0))

# Call the create_table method of the client. If no exception is thrown, the data table is created. 
try:
    ots_client.create_table(table_meta, table_options, reserved_throughput)
    print("create table succeeded.")
# If an exception is thrown, the data table fails to be created. Handle the exception. 
except Exception:
    print("create table failed.")
    

For more information about the sample code, see CreateTable at GitHub.

Create a data table and a global secondary index

The following sample code provides an example on how to create a global secondary index when you create a data table:

# Create a schema for the primary key columns of the data table, including the number, names, and types of the primary key columns. 
schema_of_primary_key = [('gid', 'INTEGER'), ('uid', 'STRING')]

# Specify the predefined columns of the data table. 
defined_columns = [('i', 'INTEGER'), ('bool', 'BOOLEAN'), ('d', 'DOUBLE'), ('s', 'STRING'), ('b', 'BINARY')]

# Create a tableMeta instance based on the name of the data table and the schema of the primary key columns. 
table_meta = TableMeta('<table_name>', schema_of_primary_key, defined_columns)

# Create a TableOptions instance. Set the time_to_live parameter to -1, which specifies that the data does not expire. Then, set the max_version parameter to 1. 
table_option = TableOptions(-1, 1)

# Set the reserved read throughput and reserved write throughput to 0. 
reserved_throughput = ReservedThroughput(CapacityUnit(0, 0))

# Specify the name, primary key columns, and attribute columns of the secondary index. Do not specify the index_type parameter. If you do not specify this parameter, a global secondary index is created. 
secondary_indexes = [
    SecondaryIndexMeta('index1', ['i', 's'], ['bool', 'b', 'd']),
    ]

# Call the create_table method of the client. If no exception is thrown, the data table and secondary index are created. 
ots_client.create_table(table_meta, table_option, reserved_throughput, secondary_indexes)

Create a data table and a local secondary index

The following sample code provides an example on how to create a local secondary index when you create a data table:

# Create a schema for the primary key columns of the data table, including the number, names, and types of the primary key columns. 
schema_of_primary_key = [('gid', 'INTEGER'), ('uid', 'STRING')]

# Specify the predefined columns of the data table. 
defined_columns = [('i', 'INTEGER'), ('bool', 'BOOLEAN'), ('d', 'DOUBLE'), ('s', 'STRING'), ('b', 'BINARY')]

# Create a tableMeta instance based on the name of the data table and the schema of the primary key columns. 
table_meta = TableMeta('<table_name>', schema_of_primary_key, defined_columns)

# Create a TableOptions instance. Set the time_to_live parameter to -1, which specifies that the data does not expire. Then, set the max_version parameter to 1. 
table_option = TableOptions(-1, 1)

# Set the reserved read throughput and reserved write throughput to 0. 
reserved_throughput = ReservedThroughput(CapacityUnit(0, 0))

# Specify the name, primary key columns, index columns, and index type of the secondary index. Set the index_type parameter to IT_LOCAL_INDEX, which specifies that a local secondary index is created. 
secondary_indexes = [
    SecondaryIndexMeta('index1', ['gid', 's'], ['bool', 'b', 'd'],index_type= SecondaryIndexType.LOCAL_INDEX),
    ]

# Call the create_table method of the client. If no exception is thrown, the data table and secondary index are created. 
ots_client.create_table(table_meta, table_option, reserved_throughput, secondary_indexes)                    

References