All Products
Search
Document Center

OpenSearch:Proxima Cluster parameters

Last Updated:Aug 27, 2024

1. Clustering

1.1 KmeansCluster and BatchKmeansCluster

Parameter

Type

Default value

Description

proxima.general.cluster.count

UINT32

0

The number of centroids.

proxima.kmeans.cluster.count

UINT32

0

The number of centroids. The priority of this parameter is higher than the priority of the proxima.general.cluster.count parameter and lower than the priority of the K value of suggest.

proxima.kmeans.cluster.shard_factor

FLOAT

16.0f

The factor for tuning multi-thread concurrency.

proxima.kmeans.cluster.epsilon

DOUBLE

FL_EPSILON

The precision of clustering convergence.

proxima.kmeans.cluster.max_iterations

UINT32

20

The maximum number of iterations.

proxima.kmeans.cluster.purge_empty

BOOL

false

Specifies whether to delete empty centroids.

proxima.kmeans.cluster.seeker_class

STRING

LinearSeeker

The class of the algorithm for seeking centroids.

proxima.kmeans.cluster.seeker_params

IndexParams

The parameters of the class of the algorithm for seeking centroids.

They are IndexParams objects.

1.2 GpuKmeansCluster

Parameter

Type

Default value

Description

proxima.general.cluster.count

UINT32

0

The number of centroids.

proxima.kmeans.cluster.count

UINT32

0

The number of centroids. The priority of this parameter is higher than the priority of the proxima.general.cluster.count parameter and lower than the priority of the K value of suggest.

proxima.kmeans.cluster.epsilon

DOUBLE

FL_EPSILON

The precision of clustering convergence.

proxima.kmeans.cluster.max_iterations

UINT32

100

The maximum number of iterations.

proxima.kmeans.cluster.purge_empty

BOOL

false

Specifies whether to delete empty centroids.

1.3 MiniBatchKmeansCluster

Parameter

Type

Default value

Description

proxima.general.cluster.count

UINT32

0

The number of centroids.

proxima.minibatchkmeans.cluster.count

UINT32

0

The number of centroids. The priority of this parameter is higher than the priority of the proxima.general.cluster.count parameter and lower than the priority of the K value of suggest.

proxima.minibatchkmeans.cluster.shard_factor

FLOAT

16.0f

The factor for tuning multi-thread concurrency.

proxima.minibatchkmeans.cluster.epsilon

DOUBLE

FL_EPSILON

The precision of clustering convergence.

proxima.minibatchkmeans.cluster.max_iterations

UINT32

20

The maximum number of iterations.

proxima.minibatchkmeans.cluster.purge_empty

BOOL

false

Specifies whether to delete empty centroids.

proxima.minibatchkmeans.cluster.try_count

UINT32

20

The number of attempts. The minimum value is 1.

proxima.minibatchkmeans.cluster.batch_count

UINT32

0

The number of features that are sampled for batch training. If the parameter value is 0, the actual value is the total number of features divided by the number of attempts.

proxima.minibatchkmeans.cluster.seeker_class

STRING

LinearSeeker

The class of the algorithm for seeking centroids.

proxima.minibatchkmeans.cluster.seeker_params

IndexParams

The parameters of the class of the algorithm for seeking centroids.

1.4 BikmeansCluster

Parameter

Type

Default value

Description

proxima.general.cluster.count

UINT32

0

The number of centroids.

proxima.bikmeans.cluster.count

UINT32

0

The number of centroids. The priority of this parameter is higher than the priority of the proxima.general.cluster.count parameter and lower than the priority of the K value of suggest.

proxima.bikmeans.cluster.init_count

UINT32

0

The number of centroids for clustering initialization in the first phase. If the parameter value is 0, the actual value is the total number of features divided by four.

proxima.bikmeans.cluster.purge_empty

BOOL

false

Specifies whether to delete empty centroids.

proxima.bikmeans.cluster.first_class

STRING

KmeansCluster

The clustering method in the first phase.

proxima.bikmeans.cluster.second_params

IndexParams

The parameters of the clustering method in the first phase.

proxima.bikmeans.cluster.second_class

STRING

KmeansCluster

The clustering method in the second phase.

proxima.bikmeans.cluster.second_params

IndexParams

The parameters of the clustering method in the second phase.

1.5 KmeansppCluster

Parameter

Type

Default value

Description

proxima.general.cluster.count

UINT32

0

The number of centroids.

proxima.kmeanspp.cluster.count

UINT32

0

The number of centroids. The priority of this parameter is higher than the priority of the proxima.general.cluster.count parameter and lower than the priority of the K value of suggest.

proxima.kmeanspp.cluster.shard_factor

UINT32

16.0f

The factor for tuning multi-thread concurrency.

proxima.kmeanspp.cluster.class

STRING

KmeansCluster

The clustering method that is called after the centroids are initialized.

proxima.kmeanspp.cluster.params

IndexParams

The parameters of the clustering method.

1.6 Kmc2Cluster/AFKmc2Cluster

Parameter

Type

Default value

Description

proxima.general.cluster.count

UINT32

0

The number of centroids.

proxima.kmc2.cluster.count

UINT32

0

The number of centroids. The priority of this parameter is higher than the priority of the proxima.general.cluster.count parameter and lower than the priority of the K value of suggest.

proxima.kmc2.cluster.shard_factor

UINT32

2.5f

The factor for tuning multi-thread concurrency.

proxima.kmc2.cluster.markov_chain_length

UINT32

0u

The length of the Markov chain. If the parameter value is 0, the actual value is the number of threads multiplied by the concurrency factor.

proxima.kmc2.cluster.class

STRING

KmeansCluster

The clustering method that is called after the centroids are initialized.

proxima.kmc2.cluster.params

IndexParams

The parameters of the clustering method.

1.7 KmedoidsCluster

Parameter

Type

Default value

Description

proxima.general.cluster.count

UINT32

0

The number of centroids.

proxima.kmedoids.cluster.count

UINT32

0

The number of centroids. The priority of this parameter is higher than the priority of the proxima.general.cluster.count parameter and lower than the priority of the K value of suggest.

proxima.kmedoids.cluster.shard_factor

FLOAT

16.0f

The factor for tuning multi-thread concurrency.

proxima.kmedoids.cluster.epsilon

DOUBLE

FL_EPSILON

The precision of clustering convergence.

proxima.kmedoids.cluster.max_iterations

UINT32

20

The maximum number of iterations.

proxima.kmedoids.cluster.purge_empty

BOOL

false

Specifies whether to delete empty centroids.

proxima.kmedoids.cluster.bench_ratio

FLOAT

0.1f

The ratio of candidate points.

proxima.kmedoids.cluster.only_means

BOOL

false

Specifies whether to use only the mean value as a candidate point. The algorithm degrades to k-means.

proxima.kmedoids.cluster.without_means

BOOL

false

Specifies whether to not use the mean value as a candidate point.

proxima.kmedoids.cluster.seeker_class

STRING

LinearSeeker

The class of the algorithm for seeking centroids.

proxima.kmedoids.cluster.seeker_params

IndexParams

The parameters of the class of the algorithm for seeking centroids.

They are IndexParams objects.

1.8 StratifiedCluster

Parameter

Type

Default value

Description

proxima.general.cluster.count

UINT32

0

The total number of centroids at the second layer.

proxima.stratified.cluster.count

UINT32

0

The total number of centroids at the second layer. The priority of this parameter is higher than the priority of the proxima.general.cluster.count parameter and lower than the priority of the K value of suggest.

proxima.stratified.cluster.first_class

STARING

KmeansCluster

The clustering method that you want to use at the first layer.

proxima.stratified.cluster.second_class

STARING

KmeansCluster

The clustering method that you want to use at the second layer.

proxima.stratified.cluster.first_count

UINT32

0

The number of centroids that you want to cluster at the first layer.

proxima.stratified.cluster.second_count

UINT32

0

The number of centroids that you want to cluster at the second layer.

proxima.stratified.cluster.first_params

IndexParams

The parameters of the clustering method that you want to use at the first layer.

proxima.stratified.cluster.second_params

IndexParams

The parameters of the clustering method that you want to use at the second layer.

proxima.stratified.cluster.auto_tuning

BOOL

false

2. Clustering estimation

2.1 GapstatsClusterEstimater