By Zhang Youdong (Linqing) from the ApsaraDB team
Apache Database for IoT (IoTDB) is a database specifically designed for IoT time series data to provide data collection, storage, and analysis functions. IoTDB provides an integrated solution with high-performance data reading and writing and rich query capabilities on the cloud. It customizes an efficient directory organization structure for IoT scenarios and seamlessly integrates with big data systems, such as Apache Hadoop, Spark, and Flink. It provides lightweight TsFile management on edge nodes. Data on edge nodes can be written to the local TsFile, and basic query capabilities are provided. TsFile data can be synchronized to the cloud.
TsFile is a file format customized for storing time series data on IoT devices. It is organized in a tree directory structure. One TsFile can store the data of multiple devices, and each device contains multiple measurements (metrics.) The following figure shows a TsFile that contains the data of two devices, which are identified as d1 and d2. Each device contains three monitoring metrics: s1, s2, and s3.
The TsFile is a multi-level mapping table. TsFileMetaData ==> TimeSeriesMetadata ==> ChunkMetadata ==> Chunk.
TsFileMetadata
describes an entire TsFile, which contains metadata information, such as version information, the location of MetadataIndexNode
, and the total number of chunks.TimeSeriesMetadata
. Each TimeSeriesMetadata
points to the metadata information of a device, the ChunkMetadata
list.ChunkMetadata
points to the ChunkHeader location and corresponds to the final chunk data.The built-in query engine in IoTDB parses all user commands, generates a plan, submits the plan to the corresponding executor, and returns the result set. Through the query engine, IoTDB provides a JDBC API, which is simple and easy to use.
IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN
IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE
IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(100,true);
IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status,temperature) values(200,false,20.71)
IoTDB> SELECT status FROM root.ln.wf01.wt01
+-----------------------+------------------------+
| Time|root.ln.wf01.wt01.status|
+-----------------------+------------------------+
|1970-01-01T08:00:00.100| true|
|1970-01-01T08:00:00.200| false|
+-----------------------+------------------------+
Total line number = 2
The metadata model of IoTDB is organized in a tree structure. An instance contains multiple storage groups
that are similar to the concept of namespace and database. A storage group
contains multiple devices
. Each device
contains multiple measurements
. The time series data corresponding to measurements
is stored in TsFile chunks
. To facilitate data expiration, each storage group
segments data by time range and stores data in different directories. By default, data is segmented by week.
//Storage Group storage structure
data
-- sequence
-- [Storage group name 1]
------ [Time partition ID 1]
-------- xxxx.tsfile
-------- xxxx.resource
------ [Time partition ID 2]
-- [Storage group name 2]
-- unsequence
The IoTDB storage engine is designed based on the LSM Tree structure. First, the written data is recorded in the WAL. Then, it is written to the memtable in the memory and gradually written to the TsFile on the disk in the background. The TsFile on the disk is compacted based on certain rules to ensure query efficiency.
IoTDB can be deployed on edge nodes and the cloud. Generally, data collected on edge nodes need to be synchronized to a remote end for further analysis and processing. IoTDB provides a synchronization tool to synchronize TsFile data on terminals or devices to the cloud.
IoTDB supports seamless connection with existing big data processing systems, including Hive and Spark. IoTDB provides connectors, such as hive-tsfile
, spark-tsfile
, and spark-iotdb
, so Hive and Spark can directly access the TsFile data and IoTDB data.
HDFS or local disks are used for storage. HDFS for storage can ensure the high availability of the storage layer, but not of the computing layer.
ApsaraDB - July 28, 2021
Apache Flink Community China - April 23, 2020
Alibaba Cloud Industry Solutions - January 13, 2022
Alibaba Clouder - February 7, 2018
Apache Flink Community - April 18, 2024
Alibaba Cloud Native - June 6, 2024
Provides secure and reliable communication between devices and the IoT Platform which allows you to manage a large number of devices on a single IoT Platform.
Learn MoreA cloud solution for smart technology providers to quickly build stable, cost-efficient, and reliable ubiquitous platforms
Learn MoreRealtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn MoreUnified billing for Internet data transfers and cross-region data transfers
Learn MoreMore Posts by ApsaraDB