An Analysis of Apache Database for IoT (IoTDB)

By Zhang Youdong (Linqing) from the ApsaraDB team

Apache Database for IoT (IoTDB) is a database specifically designed for IoT time series data to provide data collection, storage, and analysis functions. IoTDB provides an integrated solution with high-performance data reading and writing and rich query capabilities on the cloud. It customizes an efficient directory organization structure for IoT scenarios and seamlessly integrates with big data systems, such as Apache Hadoop, Spark, and Flink. It provides lightweight TsFile management on edge nodes. Data on edge nodes can be written to the local TsFile, and basic query capabilities are provided. TsFile data can be synchronized to the cloud.

TsFile

TsFile is a file format customized for storing time series data on IoT devices. It is organized in a tree directory structure. One TsFile can store the data of multiple devices, and each device contains multiple measurements (metrics.) The following figure shows a TsFile that contains the data of two devices, which are identified as d1 and d2. Each device contains three monitoring metrics: s1, s2, and s3.

The TsFile is a multi-level mapping table. TsFileMetaData ==> TimeSeriesMetadata ==> ChunkMetadata ==> Chunk.

TsFileMetadata describes an entire TsFile, which contains metadata information, such as version information, the location of MetadataIndexNode, and the total number of chunks.
MetadataIndexNode contains multiple TimeSeriesMetadata. Each TimeSeriesMetadata points to the metadata information of a device, the ChunkMetadata list.
ChunkMetadata points to the ChunkHeader location and corresponds to the final chunk data.

Query Engine

The built-in query engine in IoTDB parses all user commands, generates a plan, submits the plan to the corresponding executor, and returns the result set. Through the query engine, IoTDB provides a JDBC API, which is simple and easy to use.

IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN
IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE

IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(100,true);
IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status,temperature) values(200,false,20.71)

IoTDB> SELECT status FROM root.ln.wf01.wt01
+-----------------------+------------------------+
|                   Time|root.ln.wf01.wt01.status|
+-----------------------+------------------------+
|1970-01-01T08:00:00.100|                    true|
|1970-01-01T08:00:00.200|                   false|
+-----------------------+------------------------+
Total line number = 2

Metadata Management

The metadata model of IoTDB is organized in a tree structure. An instance contains multiple storage groups that are similar to the concept of namespace and database. A storage group contains multiple devices. Each device contains multiple measurements. The time series data corresponding to measurements is stored in TsFile chunks. To facilitate data expiration, each storage group segments data by time range and stores data in different directories. By default, data is segmented by week.

//Storage Group storage structure
data
-- sequence
-- [Storage group name 1]
------ [Time partition ID 1]
-------- xxxx.tsfile
-------- xxxx.resource
------ [Time partition ID 2]
-- [Storage group name 2]
-- unsequence

Storage Engine

The IoTDB storage engine is designed based on the LSM Tree structure. First, the written data is recorded in the WAL. Then, it is written to the memtable in the memory and gradually written to the TsFile on the disk in the background. The TsFile on the disk is compacted based on certain rules to ensure query efficiency.

Synchronization Tool

IoTDB can be deployed on edge nodes and the cloud. Generally, data collected on edge nodes need to be synchronized to a remote end for further analysis and processing. IoTDB provides a synchronization tool to synchronize TsFile data on terminals or devices to the cloud.

Connector

IoTDB supports seamless connection with existing big data processing systems, including Hive and Spark. IoTDB provides connectors, such as hive-tsfile, spark-tsfile, and spark-iotdb, so Hive and Spark can directly access the TsFile data and IoTDB data.

Summary

Benefits

IoTDB customizes IoT models, provides JDBC access methods, and supports integrated deployment on the edge and cloud.
IoTDB provides a Hadoop File system for storage. In addition, it provides multiple connectors that interconnect seamlessly with the existing big data ecosystem.
TsFile is an open storage format with a simple device model that is easy to understand.

Limitations

The IoTDB TsFile structure is currently available only in Java. Its resource usage is high in lightweight edge devices, which limits its application on the terminal side and device side.
Currently, only the standalone version is available for the cloud, which cannot meet the need to connect massive device data to the cloud.

HDFS or local disks are used for storage. HDFS for storage can ensure the high availability of the storage layer, but not of the computing layer.

Community

An Analysis of Apache Database for IoT (IoTDB)

TsFile

Query Engine

Metadata Management

Storage Engine

Synchronization Tool

Connector

Summary

Benefits

Limitations

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

IoT Platform

IoT Solution

Realtime Compute for Apache Flink

Cloud Data Transfer