Tablestore uses the data delivery service to deliver full or incremental data to Object Storage Service (OSS) in real time. This feature enables Tablestore to store historical data in OSS in a cost-effective way while Tablestore implements offline or quasi-real-time analysis of larger amounts of data.
Scenarios
You can use the data delivery service to meet your business requirements in the following scenarios:
Tiered storage of hot data and cold data
The data delivery service is combined with the time to live (TTL) feature of Tablestore to quickly store full data in OSS in a cost-effective way. Tablestore allows you to query and analyze hot data at a low latency.
Full data backup
You can use the data delivery service to deliver full data of a table in Tablestore to an OSS bucket for backup and archiving.
Large-scale data analysis in real time
You can use the data delivery service to deliver incremental data of Tablestore to OSS in real time (every 2 minutes). Delivered data is partitioned based on the system time and stored in the Parquet format. You can use OSS high-speed bandwidth for reading and optimization of scanning for Parquet data to implement efficient real-time data analysis.
Features
The data delivery service has the following features:
The data delivery service automatically pulls both full and incremental data of Tablestore. When the data volume reaches the preset size, or if no data has been delivered to OSS within 2 minutes, the pulled data is delivered to OSS for persistent storage.
The data delivery service allows you to deliver data in the following modes: incremental, full, and differential. All delivered data is stored in the Parquet format.
The data delivery service supports the monitoring of the time when data delivery is complete. The data delivery service provides the DescribeDeliveryTask operation to return the time when data delivery is complete.
Benefits
Ease of use
To deliver data from Tablestore to OSS, you need to only complete simple configurations in the Tablestore console. Delivery tasks run and the throughput capacity is scaled based on the load while monitoring and O&M are not required. However, service-level agreements (SLAs) are guaranteed.
A complete set of data delivery modes
The data delivery provides the following modes: incremental, full, and differential. When the incremental mode is used, delivery tasks implement quasi-real-time delivery of data, obtain the latest data, cache the data, and write the data to OSS after two minutes.
Seamless integration with the computational ecology
The data delivery service is compatible with open source ecology standards and the naming conventions followed by Hive. Delivered data is stored in the Parquet format. You can use E-MapReduce (EMR) to directly analyze the data delivered to OSS by using an external table.
Tiered storage and access experience
After data is delivered to OSS, you can access different data, such as data in tables, index tables, and data delivered to OSS. This way, the analysis requirements of different scenarios are met.
Usage notes
The data delivery service is available in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen).
Procedure
Create a delivery task to deliver the data of Tablestore to OSS. For more information, see Quick start and Use Tablestore SDKs to deliver Tablestore data to OSS.
Use EMR to analyze the Tablestore data that is delivered to OSS. For more information, see Use EMR.