LindormSearch is the search engine service provided by Lindorm. LindormSearch provides services by using a distributed cluster that consists of multiple nodes. Before you purchase LindormSearch, you must evaluate the resource capacity of your cluster. This topic provides some general suggestions for you to help you plan your cluster capacity.
Evaluate the storage capacity
- Number of replicas: By default, the recommended number of replicas is 0. LindormSearch uses distributed shared storage. If a node fails, your data can be automatically migrated to other nodes to ensure service continuity. If high reliability is required, we recommend that you set the number of replicas to 1.
- Index bloat: In most cases, the size of index data can increase by 20%.
- Search engine: We recommend that you reserve 20% of the storage for operations such as transaction log recording and regular compaction.
- Reserved storage for the OS: By default, 5% of the storage is reserved for the OS.
- System security threshold: To ensure that the entire cluster is stable, we recommend that you reserve 20% of the storage to run the cluster. If you reserve 20% of the storage and the storage usage reaches 80%, a text message alert is automatically sent.
The storage requirements can be estimated by using the following formula:
Required storage = Storage occupied by the source data × 1.9
The complexity and quantity of queries and data writes varies based on the business scenario. We recommend that you evaluate your storage capacity requirements before you select resources. You can perform a test to confirm whether the resources are sufficient. The following common suggestions are provided:
- Select at least two nodes to prevent single points of failure.
- We recommend that you select the nodes of high specifications such as 16 CPU cores and 64 GB memory.
- If the result of the test shows that the computing resources do not meet the business requirements, we recommend that you perform a scale-up. For example, upgrade the node specifications from 4 CPU cores and 16 GB memory to 8 CPU cores and 32 GB memory. Then, determine whether to perform a scale-out to increase the number of nodes.
Each index is divided into multiple shards. When data is written, the hash algorithm automatically allocates data to different shards based on document IDs. The following common suggestions are provided for you to configure the number of shards:
- The size of a single shard ranges from 20 GB to 50 GB.
- The number of shards is an integer multiple of the number of nodes. For example, if your instance has two nodes, set the number of shards for the created index to 2.
- If your business data such as log data and order data has the time attribute, we recommend that you use the alias feature provided by the system. This feature continuously generates new indexes and periodically deletes the original indexes. For more information about how to use aliases, see Use sharding (aliases).