The Wide Column model is similar to the data model of Bigtable or HBase and is suitable for various scenarios such as the storage of metadata and big data. The Wide Column model stores data in data tables. A single data table can store petabyte-level data and support tens of millions of queries per second (QPS). The data tables are schema-free and support wide columns, max versions, and time to live (TTL) management. The data tables also support features, such as auto-increment primary key column, local transaction, atomic counter, filter, and conditional update.
Introduction
The Wide Column model of Tablestore is similar to the data model of Bigtable or HBase. The Wide Column model stores data in data tables in a three-dimensional structure, which is defined by rows, columns, and time. Each row of a data table can have different columns. The attribute columns of a data table can be dynamically added or removed. When you create a data table, you do not need to define a strict schema for the attribute columns of the data table.
Components
The preceding figure shows the components of the Wide Column model. The following table describes the components.
Component | Description |
Primary key | A primary key uniquely identifies a row in a data table. A primary key consists of one to four primary key columns. |
Partition key | The first primary key column is called the partition key. Tablestore partitions data in a data table based on the partition key values. Rows that share the same partition key value are allocated to the same partition to ensure balanced distribution of data access requests. |
Attribute column | All columns except for the primary key columns in a row are called attributed columns. Each attribute column can contain values of different versions. Tablestore does not impose limits on the number of attribute columns that can be contained in each row. |
Version | Each value in an attribute column has a unique version number. The version number is a timestamp based on which you can manage the TTL of attribute column values. For more information, see Version number. |
Data type | Tablestore supports the following data types: STRING, BINARY, DOUBLE, INTEGER, and BOOLEAN. For more information, see Data types. |
TTL | You can specify the TTL for each data table. For example, if you set the TTL to one month for a data table, Tablestore automatically deletes data that is written to the data table one month ago. For more information, see TTL. |
Max versions | You can set the maximum number of versions for the value in each attribute column of a data table. Max versions can be used to control the number of versions for the value in each attribute column. When the actual number of versions in an attribute column exceeds the max versions value, Tablestore asynchronously deletes earlier versions. For more information, see Max versions. |
Core components
Data tables, rows, primary keys, and attributes are the core components of the Wide Column model of Tablestore. A data table consists of rows. Each row consists of a primary key and one or more attributes. The first primary key column is called the partition key.
The following table describes the primary key, attribute, and partition key.
For more information about data types supported by primary key columns and attribute columns, see Naming conventions and data types.
Component | Description |
Primary key | A primary key uniquely identifies a row in a data table. A primary key consists of one to four primary key columns. When you create a data table, you must specify primary key columns, including the name, data type, and sequence of the primary key columns. Tablestore indexes data in a data table based on the primary key values of the rows in the data table. By default, rows in a data table are sorted in ascending order based on the primary key values. |
Partition key | The first primary key column is called the partition key. To ensure load balancing, Tablestore automatically distributes a row of data to the corresponding partition and machine based on the range to which the partition key value of the row belongs. Rows that share the same partition key value belong to the same partition. A partition may store rows that have different partition key values. Tablestore splits and merges partitions based on specific rules. Note Partition key values are the basic unit to partition data. Data that shares the same partition key value cannot be further split. To prevent partitions from being too large to split, we recommend that you keep the total size of all rows that share the same partition key value to up to 10 GB. For more information about how to select a partition key, see Table operations. |
Attribute | A row can have multiple attribute columns. The number of attribute columns in a row is unlimited, and the attribute columns in each row can be different. The value of an attribute column in a row can be empty. The values in the same attribute column of multiple rows can be of different data types. An attribute column can store multiple versions of values. You can specify the number of versions of values that can be retained for an attribute column. You can also specify a TTL value for attribute column values. For more information, see Data versions and TTL. |
Differences between the Wide Column model and the relational model
The following table describes the differences between the Wide Column model and the relational model.
Model | Feature |
Wide Column model | Three-dimensional structure (row, column, and time), schema-free, wide columns, max versions, and TTL management |
Relational model | Two-dimensional structure (row and column) and fixed schema |
Limits
For more information about the general limits on the Wide Column model, see General limits.
If you use secondary indexes or search indexes to accelerate data queries, take note of the limits on the indexes. For more information, see Secondary index limits and Search index limits.
If you use SQL to query and analyze data, take note of the limits on SQL queries. For more information, see SQL limits.
Procedure
The following table describes the steps.
Step | Operation | Description |
1 | After you create a RAM user, grant minimal permissions to access Tablestore resources to the RAM user. You can use system policies or custom policies to grant the RAM user the permissions to access Tablestore resources. If you want to use an Alibaba Cloud account or a RAM user that has the required permissions to access Tablestore resources, skip this step. Important By default, an Alibaba Cloud account has permissions on all cloud resources. To ensure the security of your resources, we recommend that you create RAM users for your Alibaba Cloud account and authorize the RAM users to access different resources. | |
2 | Before you use the features of Tablestore, you must activate Tablestore. You need to activate Tablestore only once. You are not charged when you activate Tablestore. If Tablestore is activated, skip this step. | |
3 | Important
Create a Tablestore instance in the selected region based on the model of the table that you want to create in the instance and the instance type. If an existing Tablestore instance meets your business requirements, skip this step. | |
4 | Note Proper design of the primary key and partition key can effectively prevent data hotspot issues. We recommend that you design tables by referring to Table operations. Create a data table to store business-related data. When you create a data table, you can configure the following features based on your business requirements:
| |
5 | Note Proper attribute column settings can improve the efficiency of business data usage. We recommend that you specify attribute columns by referring to Data operations. You can write, update, read, and delete data in the data table.
To delete data, you can manually delete the data or specify the TTL for the data table to automatically delete the data. For more information, see Delete data or Data versions and TTL. | |
6 | Use indexes to accelerate queries | If data queries based on the primary key of a data table cannot meet your business requirements, you can use indexes to accelerate data queries. Tablestore provides secondary indexes and search indexes to meet data query requirements in different scenarios.
|
7 | Analyze data | Use the SQL query feature or search indexes to aggregate and analyze data in the data table.
Note You can also use compute engines such as MaxCompute, Spark, Hive, HadoopMR, Function Compute, and Realtime Compute for Apache Flink to analyze data in Tablestore. For more information, see Overview. |
Billing rules
The billable items include read throughput, write throughput, storage usage, and outbound traffic over the Internet. For more information, see Billing overview.
FAQ
References
You can use the Wide Column model in the Tablestore console or Tablestore CLI. For more information, see Use the Wide Column model.
To implement data center-level disaster recovery for instance data, you can create an instance of the ZRS redundancy type. For more information, see ZRS.
To ensure data storage security and network access security, you can encrypt data tables or bind a virtual private cloud (VPC) to your Tablestore instance to allow access only over the VPC. For more information, see Data encryption and Network security management.
To prevent important data from being accidentally deleted, you can use the data backup feature to back up important data on a regular basis. For more information, see Back up data in Tablestore.
To consume historical and incremental data in a data table, you can use Tunnel Service. For more information, see Overview.
To configure alert notifications for monitoring metrics, you can use CloudMonitor. For more information, see Overview.
To visualize data, you can use DataV or Grafana. For example, you can use DataV or Grafana to display data in charts. For more information, see Data visualization tools.