All Products
Search
Document Center

OpenSearch:Proxima Builder

Last Updated:Aug 27, 2024

LinearBuilder

Parameter

Type

Default value

Description

proxima.linear.builder.column_major_order

string

false

Specifies how to sort the features of an index when the index is being built. Valid values: false and true. false: indicates that the features of an index are sorted row by row. true: indicates that the features of an index are sorted column by column.

QcBuilder

Parameter

Type

Default value

Description

proxima.qc.builder.train_sample_count

uint32

0

The volume of training data. If you set the value of this parameter to 0, all data of a document is specified as training data.

proxima.qc.builder.thread_count

uint32

0

The number of threads that can be used. If you set the value of this parameter to 0, the number of threads that can be used is equal to the number of CPU cores of OpenSearch Vector Search Edition.

proxima.qc.builder.centroid_count

string

Optional

The number of centroids that you want to use for clusters. Hierarchical clusters are supported. Separate levels of hierarchical clusters with asterisks (*).

Sample value for hierarchical clusters that include one level: 1000.

Sample value for hierarchical clusters that include two levels: 100*100.

If you want to specify the number of centroids for hierarchical clusters that include two levels, we recommend that you specify more centroids for the first level than the second level. This ensures a result that is better than the result obtained when you specify a smaller number of centroids for the first level. The experience points that can be obtained in the first level are 10 times those in the second level.

If you do not specify the number of centroids, the system automatically infers the appropriate number of centroids. We recommend that you allow the system to automatically infer the number of centroids.

proxima.qc.builder.cluster_class

string

OptKmeansCluster

A clustering method. For more information, see Proxima Cluster parameters.

proxima.qc.builder.cluster_auto_tuning

bool

false

Specifies whether to automatically change the number of centroids.

proxima.qc.builder.cluster_params_in_level_

IndexParams

-

The parameters that are required to configure a clustering method. For more information, see Proxima Cluster parameters.

You must specify parameters for each level and from the first level.

Sample value for the first level: proxima.qc.builder.cluster_params_in_level_1.

proxima.qc.builder.optimizer_class

string

HcBuilder

The type of the builder optimizer that you want to use for centroids to improve the precision of classification. The type of builder optimizer decides the type of searcher optimizer by which queries are performed for candidate centroids in an online scenario. For example, if you set the parameter value to HcBuilder, HcSearcher is used to query candidate centroids in an online scenario. Valid values: HcBuilder, HnswBuilder, SsgBuilder, and LinearBuilder.

proxima.qc.builder.optimizer_params

IndexParams

-

Parameters and parameter values for the builder optimizer and searcher optimizer that are configured based on the value of the proxima.qc.builder.optimizer_class parameter. For example, if you set the value of the proxima.qc.builder.optimizer_class parameter to HnswBuilder, you can refer to the following sample code to specify the parameters and parameter values:

proxima.hnsw.builder.max_neighbor_count: 100 proxima.hnsw.searcher.max_scan_ratio: 0.1

proxima.qc.builder.converter_class

string

-

If you set the value of the Measure parameter to InnerProduct, automatic engine conversion is performed and OpenSearch Vector Search Edition uses the L2 norm to search documents.

proxima.qc.builder.converter_params

IndexParams

-

The parameters for initializing proxima.qc.builder.converter_class.

proxima.qc.builder.quantizer_class

string

-

The quantizer. By default, the system does not use quantizers. The valid values of this parameter are Int8QuantizerConverter, HalfFloatConverter, and DoubleBitConverter. In most cases, if you specify a value for this parameter, performance will be improved and the size of an index will be reduced. However, retrieval loss may occur in specific scenarios.

proxima.qc.builder.quantizer_params

IndexParams

-

The parameters and parameter values for the quantizer that you specify by using the proxima.qc.builder.quantizer_class parameter.

proxima.qc.builder.optimizer_quantizer_class

string

-

The name of the quantizer that is used to perform quantization on centroids.

proxima.qc.builder.optimizer_quantizer_params

IndexParams

-

The parameters and parameter values for the quantizer that you specify by using the proxima.qc.builder.optimizer_quantizer_class parameter.

proxima.qc.builder.quantize_by_centroid

bool

False

Specifies whether to perform quantization based on centroids if you specify a value for the proxima.qc.builder.quantizer_class parameter. The proxima.qc.builder.quantize_by_centroid parameter takes effect only when you set the value of proxima.qc.builder.quantizer_class to Int8QuantizerConverter.

proxima.qc.builder.store_original_features

bool

False

Specifies whether to retain raw features. If you specify a value for proxima.qc.builder.quantizer_class, IndexProvider obtains the features on which quantization is performed. To obtain the raw features, set the value of proxima.qc.builder.store_original_features to True.

HnswSearcher

Parameter

Type

Default value

Description

proxima.hnsw.builder.max_neighbor_count

uint32

100

The maximum number of neighbors for a node in the graph. A larger value indicates the better connectivity of a graph. Correspondingly, the building cost and index size also increase.

proxima.hnsw.builder.efconstruction

uint32

500

The size of the neighboring area that can be scanned when a graph is being built. A larger value indicates the higher quality of the offline graph building and the slower index building. We recommend that you set the value to 400 for the first time.

proxima.hnsw.builder.thread_count

uint32

0

The number of threads that can be used. If you set the value of this parameter to 0, the number of threads that can be used is equal to the number of CPU cores of OpenSearch Vector Search Edition.