Specifications - Vector Retrieval Service - Alibaba Cloud Documentation Center

This topic describes the specifications of DashVector clusters and how to select a cluster based on your business requirements.

Cluster types

DashVector provides the following three types of clusters to meet the needs in different business scenarios:

Performance-optimized type: A cluster of the performance-optimized type is suitable for scenarios in which high QPS and low query latency are required. This type of cluster also provides the highest write efficiency. For scenarios in which high concurrency, high traffic, low latency, or high write efficiency is required, a cluster of the performance-optimized type is recommended.
Storage-optimized type (recommended): A cluster of the storage-optimized type has a storage capacity that is five times that of a cluster of the performance-optimized type and is capable of storing and managing more vector data, thus more suitable for scenarios with a large amount of data. Clusters of the storage-optimized type provide excellent technical metrics and can meet your requirements in most scenarios. This type is recommended as being the most cost-effective type.
Free trial: A free trial cluster is applicable to testing and experience scenarios and cannot be used in online production environments. A free trial cluster is valid for one month, and you can apply for another trial after it expires. Certain limitations come together with the free trial clusters. For more information, see Limits.

Important

A cluster of the free trial type is valid for one month. After the free trial period ends, the cluster is automatically released, and all data in the cluster is deleted and cannot be restored. To keep the cluster valid for a longer time, upgrade the cluster of the free trial type to a paid cluster within 30 days since the free trial cluster is created. Alibaba Cloud takes no responsibility for the deletion of data caused by the expiration of a free trial cluster that fails to be upgraded to a paid cluster.

Cluster specifications

DashVector provides clusters of different specifications, which mainly differ in terms of storage capacity.

Note

If you require a cluster of higher specifications, have any feedback, or want to obtain more technical support, feel free to contact us in the following ways:

Official DingTalk group: 25130022704
Email address for technical support: dashvector@service.aliyun.com

Reference about storage capacity

Cluster type	Cluster specification	Number of documents (based on FP32 vectors of 768 dimensions)	Number of documents (based on FP32 vectors of 1,536 dimensions)
Performance-optimized type	P.small	500,000	250,000
	P.large	1,000,000	500,000
	P.2xlarge	2,000,000	1,000,000
	P.4xlarge	4,000,000	2,000,000
	P.8xlarge	8,000,000	4,000,000
	P.16xlarge	16,000,000	8,000,000
Storage-optimized type	S.small	2,500,000	1,250,000
	S.large	5,000,000	2,500,000
	S.2xlarge	10,000,000	5,000,000
	S.4xlarge	20,000,000	10,000,000
	S.8xlarge	40,000,000	20,000,000
	S.16xlarge	80,000,000	40,000,000

Important

Data in the preceding table is verified by capacity tests but is only for reference.

Documents used in the capacity test mentioned above contain no fields but only primary keys and vectors. Primary keys are strings converted from zero-based auto-increment positive integers. In most actual production scenarios, fields are indispensable and occupy storage space. Therefore, in real cases, the number of documents that can be stored is smaller than that displayed in the preceding table.

Reference about search performance

Cluster type	Cluster specification	topk=10		topk=100		topk=250		topk=1000
Cluster type	Cluster specification	QPS	RT_p99	QPS	RT_p99	QPS	RT_p99	QPS	RT_p99
Performance-optimized type	P.large (based on one million FP32 vectors of 768 dimensions)	962.6	< 30 ms	429.7	< 30 ms	387.5	< 45 ms	134.7	< 250 ms
Storage-optimized type	S.large (based on five million FP32 vectors of 768 dimensions)	297.6	< 30 ms	112.5	< 30 ms	107.4	< 50 ms	37.1	< 300 ms

Important

Data in the preceding table is obtained from actual performance tests based on Cohere dataset but only for reference due to possible influences from the data distribution of different datasets.
Documents used in the performance test mentioned above contain no fields but only primary keys and vectors. Primary keys are strings converted from zero-based auto-increment positive integers.
DashVector optimizes vector indexes in the backend as scheduled. The optimization is usually complete 4 hours after data is written, and the performance is optimal at that time.
For all cluster specifications, in the case of full data, QPS is consistent with or higher than that displayed in the preceding table and does not decrease due to the increase of data volume. For example, the QPS of a top 100 request can reach 600 or a higher value even if two million FP32 vectors of 768 dimensions occupy all the storage capacity of a P.2xlarge cluster.

Number of replicas

DashVector allows you to set the number of replicas to an integer in the range of 1 to 5. Data in all replicas is the same. The QPS becomes higher linearly with the number of replicas. Meanwhile, more replicas mean higher service availability. We recommend that you use at least two replicas if the production environment has high requirements for service availability.

Note

The modification of the number of replicas influences only QPS and service availability and does not influence storage capacity.