Migrate the recommendation service of a leading online education company to Lindorm - Lindorm

Highlights: throughput that is three times higher than self-built databases, write latency that is 1/10 of self-built HBase clusters, and scalability that is suitable for scale-out in promotion events

Challenges

The performance of self-built open source HBase clusters cannot meet the requirements of writing and calculating hundreds of thousands of events per second.
The stability and availability of self-built open source HBase clusters cannot be ensured due to their deficiency in garbage collection (GC).
The storage costs of self-built open source HBase clusters significantly grow as the amount of stored data increases.
The O&M of self-built open source HBase clusters cannot be performed on a unified platform. Therefore, users must manually scale in or out their HBase clusters, which results in O&M failures and high costs.

Solution

Lindorm has the following features to support a large number of concurrent operations with high throughput: 1. Optimized Lindorm Group Commit data writing mechanism that can improve the performance of batch write operations by three times. 2. The triplicate architecture of Lindorm Log Consensus (LLC) that writes data by using quorum-based algorithms to reduce the write latency by 50%. 3. Linear scalability that supports tens of millions of read and write operations on a single table without the need for database and table partitioning.

Lindorm implements an optimized GC mechanism that significantly reduces the maximum response latency for 99.9% of the requests. This way, requests in the recommendation service can be handled at a more stable latency.

Lindorm supports optimized compression algorithms to reduce the storage costs by up to 50%.
Lindorm provides the hot and cold data separation feature. The hot data and cold data from the same table can be stored into different storage media. This way, the storage costs of the recommendation service can be reduced without the modification of the application code.

Lindorm uses an architecture in which storage resources are decoupled from computing resources. Storage nodes and compute nodes can be scaled based on business requirements and scaling operations do not interrupt the services of applications. Lindorm automatically balance the data and requests in the service and frees users from the O&M of the service.

Benefits

After the recommendation service is migrated to Lindorm, the service can support 200,000 write operations per second. Compared with the service deployed on self-built HBase clusters, the throughput of the service is increased by three times and the write latency of the service is reduced to 1/10.
After the recommendation service is migrated to Lindorm, the data of the service is compressed at a rate that is twice as high as the rate in the service deployed on self-built HBase clusters. This way, the storage costs of the service are reduced more than 50%. In addition, the hot and cold data separation feature of Lindorm can further reduce the storage costs.
Lindorm implements a GC mechanism that is optimized based on the ZGC provided by AJDK. This way, the requests in the recommendation service are handled at a more stable latency. No faults have been reported after the service is migrated to Lindorm.
The service can be easily scaled in or out on the O&M platform of Lindorm to meet the scaling requirements of promotion events in the Spring Festival. In addition, the unified O&M platform of Lindorm significantly reduces the O&M costs and unexpected issues caused by manual operations.