What data model is used by LindormTSDB - Lindorm - Alibaba Cloud Documentation Center

This topic describes the data model of LindormTSDB and the terms that are related to the data model.

Terms

In typical scenarios in which time series data is used, such as IoT scenarios, application monitoring scenarios, and Industrial Internet scenarios, data sources generate time series data based on specific cycles in a constant manner. A row of time series data is described by using elements such as tag, timestamp, and field. Data that has the same characteristics are stored in the same table. The following figure describes the elements.

Element	Description
table	A time series table stores data of the same type. For example, you can create a time series table to store monitored data that is generated by air quality sensors.
tag	A tag is used to describe a data source. In most cases, a tag value remains unchanged over time. For example, you can create tags for a sensor based on information such as the ID of the sensor and the region where the sensor is deployed. A database in LindormTSDB automatically indexes tags and allows you to query data based on tags in a multi-dimensional manner. A tag consists of a tag key and a tag value. A tag key and a tag value must be of the STRING data type. When you define a time series table, you can explicitly specify a rule based on which data sharding on multiple nodes is performed by specifying a tag column as the primary key of the table. This way, the performance of your business can be improved.
timestamp	A timestamp indicates the point in time when a row of data is generated. You can specify the timestamp of data when you write data to LindormTSDB. The system can also automatically generate the timestamp of the data that is written to a table.
field	A field describes a metric of a data source. In most cases, field values change over time. For example, you can create fields for a sensor based on information such as temperature and humidity. You are not required to create a fixed schema for field columns before the corresponding time series table is created. You can add or delete a field column when your business is running. A field consists of a field key and a field value. A field key must be of the STRING data type. For more information about the data types supported by field values, see Data types.
data point	A data point is a field value that is generated by a data source at a point in time. When you query data in a database or write data to a database, the number of data points is used as a statistical metric.
time series	A time series is generated based on a metric that you specify for a data source. The values of the metric change over time. A time series is determined by tags. Time series data can be calculated based on operations such as downsampling, aggregation, and interpolation. You can aggregate time series data by using multiple functions, such as sum aggregate functions, count aggregate functions, max aggregate functions, and min aggregate functions. The operations are performed based on data in a time series. When a database stores data, the database stores data that is in the same time series in the same table. This helps increase the efficiency of accessing time series data and supports LindormTSDB in time series-data compression. In a time series table, a time series consists of data rows that have the same tag values.

A LindormTSDB database provides a TTL-based management mechanism and supports various data operations, such as aggregation, downsampling, and interpolation.

Operation	Description
aggregation	Aggregation refers to operations such as grouping, counting the number of, and calculating the sum of data points in one or more time series.
downsampling	If the time range for your query is large and the sampling rate of your raw data is high, you can perform downsampling to reduce the display precision of your query. For example, if your raw data is sampled at a granularity of seconds, you can perform downsampling in your query to sample the data at a granularity of hours. This way, you can reduce the number of data points in your result set.
interpolation	If no data points are collected in a specific time window in a time series, you can perform interpolation to fill values.
TTL	Time to live (TTL) is a validity period during which your data is retained. If the validity period of your data elapses, the data is removed. By default, data is retained in a permanent manner.

Sample scenarios

A wind turbine power station has multiple intelligent wind turbines. A table named Wind-generators is created to store information about the wind turbines. Tags are used to describe the wind turbines. The tags include ID, model, and manufacturer. Each of the wind turbines reports field values, such as the values of power and wind speed. The field values are written to a cloud database in LindormTSDB in real time.

The following figure shows data that corresponds to the wind turbines used in the preceding scenario.

A LindormTSDB database supports high-concurrency write throughput, provides a high compression rate, and can meet the requirements of the following types of queries:

You can query data points for a specified metric within a specified time range in a specified time series.
For example, you can query the values of wind speed for a wind turbine whose ID is 7AD45EC within a 30-minute time range from 2020-10-24T00:00:00Z to 2020-10-24T00:30:00Z.
You can query data points for multiple metrics within a specified time range in a specified time series.
For example, you can query the values of power and wind speed from 2020-10-24T00:00:00Z to 2020-10-25T00:00:00Z for a wind turbine whose ID is 7AD45EC and perform downsampling on the values at a granularity of five minutes.
You can perform aggregation on data points for a specified metric within a specified time range in a specified time series.
For example, you can perform aggregation to query the average value of power from 2020-10-24T00:00:00Z to 2020-10-24T00:30:00Z for a wind turbine whose ID is 7AD45EC.
You can query the latest values for multiple metrics in a specified time series.
For example, you can query the latest values of power and wind speed for a wind turbine whose ID is 7AD45EC.
You can perform an aggregate query on data points in multiple time series based on a tag.
For example, you can perform an aggregate query to query the average value of wind speed at 2020-10-24T00:00:00Z based on KingWind.