The apack plug-in is developed by the Alibaba Cloud Elasticsearch team. This plug-in provides the physical replication and vector search features. This topic describes only the physical replication feature. This feature greatly reduces CPU overheads and improves write performance in scenarios such as logging and time series analytics. In these scenarios, replica shards are configured for indexes, large amounts of data are written, and data visibility is latency-insensitive.
Prerequisites
- An Alibaba Cloud Elasticsearch V6.7.0 or V7.10.0 cluster is created. If you create a V6.7.0 cluster, make sure that the kernel version of the cluster is V1.2.0 or later. In this topic, a V6.7.0 cluster is used. For more information about how to create a cluster, see Create an Alibaba Cloud Elasticsearch cluster.
- The apack plug-in is installed for the cluster.
Only Elasticsearch V6.7.0 and V7.10.0 clusters support the apack plug-in. If you use an Elasticsearch V6.7.0 cluster whose kernel version is earlier than V1.2.0, you must update the kernel of the cluster before you can use the apack plug-in. For more information, see Upgrade the version of a cluster. If the kernel version of your V6.7.0 cluster is V1.2.0 or later, the apack plug-in is installed for the cluster by default and cannot be removed. You can go to the Plug-ins page to check whether the plug-in is installed.Note After the apack plug-in is installed, you can use both the physical replication and vector search features. For more information about how to use the vector search feature, see Use the aliyun-knn plug-in.
Background information
Basic principle of the physical replication feature: If the feature is disabled, the system writes data to a primary shard after the node that stores the primary shard receives a write request. Then, the system synchronizes the request to the nodes where the replica shards of the primary shard reside and writes the index data to the replica shards. This process is the same as that in open source Elasticsearch. In this process, index data is written to not only the primary shard and its replica shards but also their translogs. After the feature is enabled, index data is written to the primary shard, its translogs, and the translogs of its replica shards. This ensures data reliability and consistency. Each time the primary shard is refreshed, the system copies incremental index data to the replica shards of the primary shard over the network. This feature delays data visibility for several milliseconds but significantly improves the write performance of a cluster.
- Test environment
- Cluster configuration: five data nodes, each of which offers 8 vCPUs, 32 GiB of memory, and one 2-TiB standard SSD
- Dataset: 74-GiB nyc_taixs of Rally provided by open source Elasticsearch
- Index configuration: five primary shards, and one replica shard for each primary shard (default configuration)
- Test result
Service Write speed (document/s) Open source Elasticsearch 6.7.0 127305 Alibaba Cloud Elasticsearch V6.7.0 (with the physical replication feature enabled) 184592 - Test conclusion
Alibaba Cloud Elasticsearch with the physical replication feature enabled delivers a write performance 45% better than open source Elasticsearch.
Precautions
- The physical replication feature of the apack plug-in works on indexes. By default, this feature is disabled for indexes created before the plug-in is installed and is enabled for indexes created after the plug-in is installed. If your indexes are created before the plug-in is installed, you must enable the feature before you can use it.
- You can disable the physical replication feature for an index. However, before you disable this feature, disable the index.
- Before you enable the physical replication feature for an index, disable the index and set the number of replica shards for the index to 0.
Enable the physical replication feature for a new index
PUT index-1
{
"settings": {
"index.replication.type" : "segment"
}
}