All Products
Search
Document Center

Tablestore:Wide Column model

Last Updated:Dec 02, 2024

The Wide Column model is similar to the data model of Bigtable or HBase and is suitable for various scenarios such as the storage of metadata and big data. The Wide Column model stores data in data tables. A single data table can store petabyte-level data and support tens of millions of queries per second (QPS). The data tables are schema-free and support wide columns, max versions, and time to live (TTL) management. The data tables also support features, such as auto-increment primary key column, local transaction, atomic counter, filter, and conditional update.

Introduction

The Wide Column model of Tablestore is similar to the data model of Bigtable or HBase. The Wide Column model stores data in data tables in a three-dimensional structure, which is defined by rows, columns, and time. Each row of a data table can have different columns. The attribute columns of a data table can be dynamically added or removed. When you create a data table, you do not need to define a strict schema for the attribute columns of the data table.

Components

The preceding figure shows the components of the Wide Column model. The following table describes the components.

Component

Description

Primary key

A primary key uniquely identifies a row in a data table. A primary key consists of one to four primary key columns.

Partition key

The first primary key column is called the partition key. Tablestore partitions data in a data table based on the partition key values. Rows that share the same partition key value are allocated to the same partition to ensure balanced distribution of data access requests.

Attribute column

All columns except for the primary key columns in a row are called attributed columns. Each attribute column can contain values of different versions. Tablestore does not impose limits on the number of attribute columns that can be contained in each row.

Version

Each value in an attribute column has a unique version number. The version number is a timestamp based on which you can manage the TTL of attribute column values. For more information, see Version number.

Data type

Tablestore supports the following data types: STRING, BINARY, DOUBLE, INTEGER, and BOOLEAN. For more information, see Data types.

TTL

You can specify the TTL for each data table. For example, if you set the TTL to one month for a data table, Tablestore automatically deletes data that is written to the data table one month ago. For more information, see TTL.

Max versions

You can set the maximum number of versions for the value in each attribute column of a data table. Max versions can be used to control the number of versions for the value in each attribute column. When the actual number of versions in an attribute column exceeds the max versions value, Tablestore asynchronously deletes earlier versions. For more information, see Max versions.

Core components

Data tables, rows, primary keys, and attributes are the core components of the Wide Column model of Tablestore. A data table consists of rows. Each row consists of a primary key and one or more attributes. The first primary key column is called the partition key.

The following table describes the primary key, attribute, and partition key.

Note

For more information about data types supported by primary key columns and attribute columns, see Naming conventions and data types.

Component

Description

Primary key

A primary key uniquely identifies a row in a data table. A primary key consists of one to four primary key columns. When you create a data table, you must specify primary key columns, including the name, data type, and sequence of the primary key columns.

Tablestore indexes data in a data table based on the primary key values of the rows in the data table. By default, rows in a data table are sorted in ascending order based on the primary key values.

Partition key

The first primary key column is called the partition key. To ensure load balancing, Tablestore automatically distributes a row of data to the corresponding partition and machine based on the range to which the partition key value of the row belongs. Rows that share the same partition key value belong to the same partition. A partition may store rows that have different partition key values. Tablestore splits and merges partitions based on specific rules.

Note

Partition key values are the basic unit to partition data. Data that shares the same partition key value cannot be further split. To prevent partitions from being too large to split, we recommend that you keep the total size of all rows that share the same partition key value to up to 10 GB. For more information about how to select a partition key, see Table operations.

Attribute

A row can have multiple attribute columns. The number of attribute columns in a row is unlimited, and the attribute columns in each row can be different. The value of an attribute column in a row can be empty. The values in the same attribute column of multiple rows can be of different data types.

An attribute column can store multiple versions of values. You can specify the number of versions of values that can be retained for an attribute column. You can also specify a TTL value for attribute column values. For more information, see Data versions and TTL.

Differences between the Wide Column model and the relational model

The following table describes the differences between the Wide Column model and the relational model.

Model

Feature

Wide Column model

Three-dimensional structure (row, column, and time), schema-free, wide columns, max versions, and TTL management

Relational model

Two-dimensional structure (row and column) and fixed schema

Limits

For more information about the general limits on the Wide Column model, see General limits.

  • If you use secondary indexes or search indexes to accelerate data queries, take note of the limits on the indexes. For more information, see Secondary index limits and Search index limits.

  • If you use SQL to query and analyze data, take note of the limits on SQL queries. For more information, see SQL limits.

Procedure

image

The following table describes the steps.

Step

Operation

Description

1

Grant permissions on Tablestore resources to a RAM user

After you create a RAM user, grant minimal permissions to access Tablestore resources to the RAM user. You can use system policies or custom policies to grant the RAM user the permissions to access Tablestore resources.

If you want to use an Alibaba Cloud account or a RAM user that has the required permissions to access Tablestore resources, skip this step.

Important

By default, an Alibaba Cloud account has permissions on all cloud resources. To ensure the security of your resources, we recommend that you create RAM users for your Alibaba Cloud account and authorize the RAM users to access different resources.

2

Activate Tablestore

Before you use the features of Tablestore, you must activate Tablestore.

You need to activate Tablestore only once. You are not charged when you activate Tablestore. If Tablestore is activated, skip this step.

3

Create a Tablestore instance

Important
  • Before you create a Tablestore instance, you must determine the model of the table that you want to create in the instance and the instance type based on the business characteristics and business requirements on read and write performance and costs. For more information, see Billing overview and Instances.

  • The search index, Tunnel Service, SQL query, data delivery, data encryption, control policy, data backup, and zone-redundant storage (ZRS) features of the Wide Column model are supported only in specific regions. Select a region that supports the required features to create an instance. For more information, see Features and regions.

Create a Tablestore instance in the selected region based on the model of the table that you want to create in the instance and the instance type.

If an existing Tablestore instance meets your business requirements, skip this step.

4

Create a data table

Note

Proper design of the primary key and partition key can effectively prevent data hotspot issues. We recommend that you design tables by referring to Table operations.

Create a data table to store business-related data. When you create a data table, you can configure the following features based on your business requirements:

  • If you want to use attribute columns to query data, you can create secondary indexes to accelerate queries.

  • You can enable data at rest encryption (DARE) by specifying data encryption settings for the data table.

  • In system design scenarios that require an auto-increment primary key column, such as item IDs on e-commerce websites, user IDs on large websites, post IDs in forums, and message IDs in chat tools, you can specify an auto-increment primary key column when you create a data table.

5

Perform basic operations on data

Note

Proper attribute column settings can improve the efficiency of business data usage. We recommend that you specify attribute columns by referring to Data operations.

You can write, update, read, and delete data in the data table.

  1. Write data to the data table. For more information, see Write data.

  2. Read data from the data table based on the primary key. For more information, see Read data.

To delete data, you can manually delete the data or specify the TTL for the data table to automatically delete the data. For more information, see Delete data or Data versions and TTL.

6

Use indexes to accelerate queries

If data queries based on the primary key of a data table cannot meet your business requirements, you can use indexes to accelerate data queries. Tablestore provides secondary indexes and search indexes to meet data query requirements in different scenarios.

  • Secondary index: allows you to query data based on the attribute columns of a data table. Tablestore provides global secondary indexes and local secondary indexes to meet different requirements for read consistency.

    Secondary indexes are suitable for scenarios in which the columns that you want to query can be determined, the number of columns that you want to query is small, and the values of all primary key columns or primary key prefix can be determined.

  • Search index: uses inverted indexes, Bkd-trees, and column stores for various query scenarios.

    Search indexes are suitable for all query and analysis scenarios in which queries based on the primary key and secondary indexes cannot meet your business requirements. For example, you can perform a query based on non-primary key columns, Boolean query, relational query, full-text search, geo query, prefix query, fuzzy query, nested query, and exists query by using search indexes.

7

Analyze data

Use the SQL query feature or search indexes to aggregate and analyze data in the data table.

  • SQL query: You can execute the SELECT statement to use features such as JOIN functions, full-text search, aggregation, arithmetic operations, relational operations, logical operations, grouping by field value, nested query of search indexes, query on ARRAY fields of search indexes, and JSON functions. For more information, see Query data.

  • Search index aggregation: You can perform aggregation operations to obtain the minimum value, maximum value, sum, average value, count and distinct count of rows, and percentile statistics. You can also perform aggregation operations to group results by field value, range, geographical location, filter, histogram, or date histogram, query the rows in grouped query results, and perform nested queries.

Note

You can also use compute engines such as MaxCompute, Spark, Hive, HadoopMR, Function Compute, and Realtime Compute for Apache Flink to analyze data in Tablestore. For more information, see Overview.

Billing rules

The billable items include read throughput, write throughput, storage usage, and outbound traffic over the Internet. For more information, see Billing overview.

FAQ

References

  • You can use the Wide Column model in the Tablestore console or Tablestore CLI. For more information, see Use the Wide Column model.

  • To implement data center-level disaster recovery for instance data, you can create an instance of the ZRS redundancy type. For more information, see ZRS.

  • To ensure data storage security and network access security, you can encrypt data tables or bind a virtual private cloud (VPC) to your Tablestore instance to allow access only over the VPC. For more information, see Data encryption and Network security management.

  • To prevent important data from being accidentally deleted, you can use the data backup feature to back up important data on a regular basis. For more information, see Back up data in Tablestore.

  • To consume historical and incremental data in a data table, you can use Tunnel Service. For more information, see Overview.

  • To configure alert notifications for monitoring metrics, you can use CloudMonitor. For more information, see Overview.

  • To visualize data, you can use DataV or Grafana. For example, you can use DataV or Grafana to display data in charts. For more information, see Data visualization tools.