TableStoreWriter is a tool class provided by Tablestore SDK for Java and encapsulates the operations used to import data with high concurrency and high throughput. You can use TableStoreWriter to write data to Tablestore data tables with high concurrency and specify row-level callbacks and custom configurations.
Background information
TableStoreWriter is suitable only for the Wide Column model.
Why use TableStoreWriter?
In scenarios such as logging and IoT tracing, devices generate a large amount of data in a short period of time, and users need to write device data to databases. The databases are required to provide write performance with high concurrency and throughput as high as tens of thousands or even millions of rows per second. However, the BatchWriteRow operation of Tablestore can be used to write only up to 200 rows in a batch.
Features of TableStoreWriter
To meet the high requirements for concurrent write performance in the preceding scenarios, Tablestore provides an easy to use data import tool class named TableStoreWriter that can deliver high performance based on Tablestore SDK for Java. TableStoreWriter encapsulates the operations that you can use to import data with high concurrency and high throughput. You can use TableStoreWriter to write data to Tablestore data tables with high concurrency. You can also specify row-level callbacks and custom configurations.
Scenarios
If your business scenario has the following characteristics, you can use TableStoreWriter to write data to Tablestore:
High throughput is required due to the high concurrency of the applications.
No requirement for the write latency of a single row of data.
Data can be asynchronously written (the producer-consumer mode can be used).
The same row of data can be repeatedly written.
The following items describe the common use scenarios of TableStoreWriter:
Log storage
Log storage systems store a large volume of log data, provide high throughput, and can asynchronously process data. You can use TableStoreWriter to write data from log storage systems to Tablestore with high concurrency and high throughput. When you use TableStoreWriter, repeated writes do not affect the validity of log data.
Messaging system
Message systems, such as instant messaging, are required to support the processing of a large number of messages (such as the write amplification of group chat messages), provide high throughput, and have latency tolerance for a single message as high as hundreds of milliseconds. Message processing can be asynchronously performed. Each message has a unique ID and the content cannot be changed. As a result, repeated writes are supported without causing exceptions. When you use Tablestore to store data in messaging systems, you can use TableStoreWriter to quickly write data to Tablestore.
Consumption of messages from distributed queues
Distributed queues are widely used in complex distributed systems. Distributed queues not only provide high-performance message delivery but also decouple dependencies between modules to simplify the system architecture.
If your business architecture uses distributed queues and one of the consumer tasks is to import data to Tablestore, consider using TableStoreWriter. TableStoreWriter is based on the producer-consumer mode and is suitable for scenarios similar to distributed queues.
Architecture
TableStoreWriter is a tool class that is re-encapsulated based on operations at the SDK processing layer. The following items describe the relationship between TableStoreWriter and Tablestore SDK for Java:
TableStoreWriter depends on the AsyncClient asynchronous operation provided by Tablestore SDK for Java.
TableStoreWriter uses the BatchWriteRow operation of Tablestore SDK for Java to import data to Tablestore.
Retry attempts that are performed by TableStoreWriter on a single failed row depend on RetryStrategy of Tablestore SDK for Java.
You can also use the operations provided by Tablestore SDK for Java to write data to Tablestore. Compared with the operations provided by Tablestore SDK for Java, TableStoreWriter is optimized in terms of performance and ease of use and provides the following features:
Asynchronous operations: Fewer threads are used to provide higher concurrency.
Automatic data aggregation: A buffer queue is used in the memory to maximize the number of write requests sent to Tablestore at the same time. This improves the write throughput.
Producer-consumer mode: This mode facilitates asynchronous processing and data aggregation.
High-performance data exchange queues: Disruptor RingBuffer, which uses the multi-producer and single-consumer mode, is used to provide better performance.
Masking of complex request encapsulation: The details of calling the BatchWriteRow operation are masked. Dirty data is filtered out by using pre-checks, and the request limits, such as the limit on the number or size of rows that can be imported in a batch, are automatically processed. Dirty data refers to the rows whose schema is different from the schema of the table, whose size exceeds the upper limit, or in which the number of columns exceeds the upper limit.
Row-level callbacks: Compared with the request-level callbacks provided by Tablestore SDK for Java, TableStoreWriter provides row-level callbacks to allow the business logic to process data at the row level.
Row-level retry attempts: If request-level retry attempts fail, row-level retry attempts are performed for specific error codes to ensure the highest possible write success rate.
What to do next
Get started with TableStoreWriter. For more information, see Use TableStoreWriter to concurrently write data.