All Products
Search
Document Center

OpenSearch:Perform stress testing on vector indexes

Last Updated:May 29, 2024

Preparations

image.png

image.png

Decompress the gist.tar.gz package and use the gist_base.fvecs file in the decompressed directory.

  • Install Python 3 and related libraries.

h5py
json
numpy
sklearn
alibabacloud_ha3engine_vector

Generate data

  • Run the prepare_data.py script to generate data. The script supports vector data that is stored in .hdf5, .fvecs, .bvecs, and .ivecs files. In this example, an .hdf5 file is used.

python3 prepare_data.py -i ./gist-960-euclidean.hdf5 
  • After the script is run, view the data/ subdirectory that is generated in the script directory.

    • View that a file named gist-960-euclidean.hdf5.data is generated.

    • Check whether the specified number of data rows are generated.

wc -l data/gist-960-euclidean.hdf5.data

1000000 gist-960-euclidean.hdf5.data

Purchase an OpenSearch Vector Search Edition instance

For more information, see Purchase an OpenSearch Vector Search Edition instance.

Create a table

References:

Push data

  • Run the push_data.py script to push data.

  • Parameters:

    • -t: the table name.

    • -u: the username.

    • -p: the password.

    • -e: the instance ID.

python3 push_data.py -i data/gist-960-euclidean.hdf5.data  -t gist -u ${user_name} -p ${password} -e ${instance_id}

Generate a query

  • Run the prepare_query.py script to randomly generate queries from the raw data.

python3 prepare_query.py -i gist-960-euclidean.hdf5 -c 10000 -t gist
  • Obtain the query.data file that is generated in the data/ subdirectory.

Use wrk to perform stress testing

  • wrk is an open source tool that sends HTTP requests for stress testing. For more information, visit https://github.com/wg/wrk.

  • Download wrk from GitHub.

git clone https://github.com/wg/wrk.git
  • Run the search.lua script for stress testing.

    • Copy the script to the wrk/scripts/ directory.

cp search.lua wrk/scripts/
  • Calculate the signature and set the header["authorization"] parameter in the request method in the script to the signature.

-- During execution, wrk randomly generates queries to construct specific requests.
request = function ()
  local query = query_table[count]
  count = (count + 1)%query_count
  local headers = {}
  headers["authorization"] = "Basic xxxx" -- The signature information.
  headers["Content-Type"] = "application/json"
  return wrk.format("POST", nil, headers, query)
end

  • Perform stress testing.

    • -c: the number of concurrent connections.

    • -t: the number of threads for sending the requests.

    • -d: the specified duration for stress testing.

    • -s: the specified script.

    • --latency: displays the detailed stress testing results.

./wrk -c24 -d100s -t8 -s scripts/search.lua http://ha-cn-xxxxxx.ha.aliyuncs.com/vector-service/query --latency

View metrics

Script download links