All Products
Search
Document Center

:TairSearch performance whitepaper

最終更新日:Oct 13, 2023

This topic describes the methods that are used to test the write and query performance of TairSearch and RediSearch, and provides the test results.

TairSearch is an in-house full-text search data structure of Tair. TairSearch uses query syntax that is similar to that of Elasticsearch to implement effective full-text search. For more information, see Search.

Test description

Client test environment

Item

Description

Host of the client

An Elastic Compute Service (ECS) instance of the ecs.g7.8xlarge type. For more information, see Overview of instance families.

Region and zone

Zone K in China (Hangzhou)

Operating system

CentOS 7.9 64-bit

Database test environment

A Tair database and a self-managed Redis database are hosted on the same ECS instance.

Table 1. Tair database

Item

Description

Tair version

Tair DRAM-based instance that runs the minor version 5.0.30 and is compatible with Redis 5.0.

Number of I/O threads

4

CPU resource

6 vCPUs. Sample command: taskset -c 1-6 ./src/redis-server redis.conf.

Table 2. Self-managed Redis database

Item

Description

Redis version

7.0.10

RediSearch version

2.6.6. The database must have the CONCURRENT_WRITE_MODE parameter set to true.

RedisJSON version

2.4.6

Number of I/O threads

4

CPU resource

6 vCPUs. Sample command: taskset -c 1-6 ./src/redis-server redis.conf.

Test data

The test data is a collection of articles in Chinese and a collection of articles in English from Wikimedia. For more information, see Index of /zhwiki/latest/ and Index of /enwiki/latest/.

Examples:

{
    "id":"History_of_Pakistan",
    "title":"History of Pakistan",
    "url":"https://en.wikipedia.org/wiki/History_of_Pakistan",
    "abstract":"The history of Pakistan for the period preceding the country's independence in 1947Pakistan was created as the Dominion of Pakistan on 14 August 1947 after the end of British rule in, and partition of British India. is shared with that of Afghanistan, India."
}
{
    "id":"Wikipedia:哲学",
    "title":"Wikipedia:哲学",
    "url":"https://zh.wikipedia.org/wiki/%E5%93%B2%E5%AD%A6",
    "abstract":"哲学()是研究普遍的、基本问题的学科,包括存在、知识、价值、理智、心灵、语言等领域。哲学与其他学科不同之处在於哲学有独特之思考方式,例如批判的方式、通常是系统化的方法,并以理性论证为基础。"
}

Test tool

Download the binary executable file that matches your operating system. The file for Darwin is named TairSearchBench.Darwin, the file for Linux is named TairSearchBench.Linux, and the file for Windows is named TairSearchBench.Windows.

In this example, TairSearchBench.Linux is used. Run the ./TairSearchBench.Linux --help command to check how to use the tool.

Usage of ./TairSearchBench.linux:
  -a string
        The address(ip:port) of network to connect 
        # The endpoint of the Tair instance. 
  -c int
        Benchmark concurrency (default 30)
        # The number of tests that can be run concurrently. Default value: 30. 
  -d uint
        Specify the number of seconds for the benchmark (default 30)
        # The duration of the test. When the duration ends, the test is terminated. Default value: 30. Unit: seconds. 
  -e string
        The engine backend to run [tairsearch/redisearch]
        # Specify TairSearch or RediSearch as the engine that the instance runs. 
  -f string
        Input file to ingest data from (wikipedia abstracts)
        # The path of the execution data file. 
  -h string
        Print usage (default "help")
        # Display the usage of the tool. 
  -j string
        Specify the big json file to write
        # Specify the path of the JSON file to be written. 
  -n uint
        Specify the number of times to benchmark (default 100000)
        # The total number of operations to perform for a test. Default value: 100000. 
  -o int
        Overwrite the doc (We will write the document with the same document id)
        # Specify whether to overwrite the original document. Valid values: 1 (true) and 0 (false). Default value: 0. 
  -p string
        The password of redis to connect
        # The password of the instance. 
  -q string
        Search query string to benchmark
        # The query statement that is used to run tests. 
  -s uint
        Specify the compress threshold for tairsearch (default 10000000000)
        # Specify the compression threshold for TairSearch. If the size of a document exceeds the threshold, the document is compressed. Unit: bytes. Default value: 10000000000 (10 KB). 
  -t string
        Specify the type of benchmark [write/search/readwrite]
        # Set the test type to write, search, or readwrite. 
  -z string
        Specify the analyzer to use for query (default "standard")
        # Specify the analyzer for the query. Default value: standard. 

Before you perform testing, allocate 20 vCPUs to the ECS instance. Sample command: taskset -c 10-30 ./TairSearchBench.linux.

Preparations

Create a schema (index). Examples:

  • TairSearch

    {
        "settings": {
            "compress_doc": {
                "size": "user-defined compression threshold",
                "enable": true
            }
        },
        "mappings": {
            "properties": {
                "id":       {"type": "keyword"},
                "url":      {"type": "keyword", "index": false},
                "title":    {"type": "text", "analyzer": "user-defined analyzer"},
                "abstract": {"type": "text", "analyzer": "user-defined analyzer"},
                "url_len":  {"type": "integer"},
                "abstract_len":  {"type": "integer"},
                "title_len":  {"type": "integer"}
            }
        }
    }
  • RediSearch

    \\ SCHEMA
    $.id AS id TEXT
    $.url AS url TEXT NOINDEX
    $.title AS title TEXT
    $.abstract AS abstract TEXT
    $.abstract_len AS abstract_len NUMERIC
    $.url_len AS url_len NUMERIC
    $.title_len AS title_len NUMERIC
    
    \\ If the test data is documents in Chinese, add LANGUAGE CHINESE to the preceding code. 

Test commands and test results

In the following tests, one million documents are written for each write test. One million queries are performed on one million documents for each query test. Each test that combines write and query operations is configured to run for 60 seconds.

Write data in English

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e tairsearch -f ./enwiki-latest-abstract.xml -c 20 -n 1000000 -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e redisearch -f ./enwiki-latest-abstract.xml -c 20 -n 1000000 -a 127.0.0.1:6379

Results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

Memory used (GB)

TairSearch

22,615.15

0.874

1.735

1.39

RediSearch

18,295.10

1.092

2.352

1.67

Write data in Chinese

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e tairsearch -f ./zhwiki-latest-abstract.xml -c 20 -n 1000000 -a 127.0.0.1:6379 -z jieba
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e redisearch -f ./zhwiki-latest-abstract.xml -c 20 -n 1000000 -a 127.0.0.1:6379 -z chinese

Results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

Memory used (GB)

TairSearch

13,980.41

1.427

3.275

1.87

RediSearch

10,924.40

1.830

3.857

1.83

Note

TairSearch has a higher memory usage than RediSearch because the jieba analyzer is used and more fine-grained tokens are generated.

Overwrite data in English

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e tairsearch -f ./enwiki-latest-abstract.xml -c 20 -n 1000000 -o 1 -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e redisearch -f ./enwiki-latest-abstract.xml -c 20 -n 1000000 -o 1 -a 127.0.0.1:6379

Results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

Memory used (GB)

TairSearch

9,775.03

2.041

3.974

0.0002

RediSearch

22,239.67

0.898

1.38

0.165

Note

When you perform an overwrite operation, RediSearch marks the original document for later deletion. This causes additional memory usage. In comparison, TairSearch deletes the original document in real time.

Overwrite data in Chinese

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e tairsearch -f ./zhwiki-latest-abstract.xml -c 20 -n 1000000 -o 1 -a 127.0.0.1:6379 -z jieba
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e redisearch -f ./zhwiki-latest-abstract.xml -c 20 -n 1000000 -o 1 -a 127.0.0.1:6379

Results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

Memory used (GB)

TairSearch

6,194.15

3.206

6.456

0.025 (including the memory used by the jieba analyzer dictionary)

RediSearch

25,096.18

0.796

1.338

0.671

Note

When you perform an overwrite operation, RediSearch marks the original document for later deletion. This causes additional memory usage. In comparison, TairSearch deletes the original document in real time.

Use a term statement to query data in English

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 1000000 -q '{"query":{"term":{"abstract":"hello"}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 1000000 -q "@abstract:hello" -a 127.0.0.1:6379

Results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

45,501.13

0.437

0.563

RediSearch

28,513.87

0.700

0.833

Use a term statement to query data in Chinese

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 1000000 -q '{"query":{"term":{"abstract":"你好"}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 1000000 -q "@abstract:你好" -a 127.0.0.1:6379

Results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

40,670.47

0.489

0.635

RediSearch

24,437.48

0.817

1.331

Use a match statement to query data in English

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 1000000 -q '{"query":{"match":{"abstract":{"operator":"and","query":"chinese history"}}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 1000000 -q "@abstract:chinese history" -a 127.0.0.1:6379

Results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

24,548.94

0.812

0.971

RediSearch

2,420.66

8.261

8.523

Use a match statement to query data in Chinese

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 100000 -q '{"query":{"match":{"abstract":{"operator":"and","query":"中国的历史"}}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 100000 -q "@abstract:中国的历史" -a 127.0.0.1:6379  -analyzer jieba

Results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

6,601.05

3.027

3.669

RediSearch

889.37

22.486

22.985

Use a bool statement to query data in English

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 100000 -q '{"query":{"bool":{"must":[{"term":{"abstract":"war"}},{"term":{"abstract":"japanese"}},{"range":{"abstract_len":{"gt":500}}}],"must_not":{"term":{"abstract":"America"}},"should":[{"term":{"abstract":"chinese"}},{"term":{"abstract":"china"}}]}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 100000 -q "@abstract:(war japanese -America (chinese|china)) @abstract_len:[500 +inf]"  -a 127.0.0.1:6379

Results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

4,554.22

4.388

5.702

RediSearch

1,124.08

17.791

18.444

Use a bool statement to query data in Chinese

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 100000 -q '{"query":{"bool":{"must":[{"term":{"abstract":"战争"}},{"term":{"abstract":"日本"}},{"range":{"abstract_len":{"gt":500}}}],"must_not":{"term":{"abstract":"美国"}},"should":[{"term":{"abstract":"中国"}},{"term":{"abstract":"亚洲"}}]}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 1000000 -q "@abstract:(战争 日本 -美国 (中国|亚洲)) @abstract_len:[500 +inf]"  -a 127.0.0.1:6379 -analyzer jieba

Results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

2,619.00

7.623

18.42

RediSearch

1,199.76

16.669

17.064

Use a range statement to query data in English

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 1000000 -q '{"query":{"range":{"abstract_len":{"lte":420, "gte":400}}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 1000000 -q "@abstract_len:[400,420]" -a 127.0.0.1:6379

Results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

2,840.02

7.038

8.599

RediSearch

1,307.02

15.300

16.817

Use a prefix statement to query data in English

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 1000000 -q '{"query":{"prefix":{"abstract":"happiness"}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 1000000 -q "@abstract:happiness*" -a 127.0.0.1:6379

Results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

36,491.10

0.545

0.688

RediSearch

25,558.92

0.781

0.930

Use a prefix statement to query data in Chinese

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 1000000 -q '{"query":{"prefix":{"abstract":"开心"}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 1000000 -q "@abstract:开心*" -a 127.0.0.1:6379 -z chinese

Results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

41,308.71

0.481

0.638

RediSearch

27,457.86

0.727

1.234

Write data and use a term statement to query data

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t readwrite -e tairsearch -f ./enwiki-latest-abstract.xml -c 20 -d 60 -q '{"query":{"term":{"abstract":"hello"}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t readwrite -e redisearch -f ./enwiki-latest-abstract.xml -c 20 -d 60 -q "@abstract:hello" -a 127.0.0.1:6379

Results

Engine

Average write QPS

Average write latency (ms)

Average QPS

Average query latency (ms)

TairSearch

14,699.77

1.359

16,224.03

1.232

RediSearch

11,386.75

1.755

11,386.70

1.755

Write data and use a bool statement to query data

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t readwrite -e tairsearch -f ./enwiki-latest-abstract.xml -c 20 -d 60 -q '{"query":{"bool":{"must":[{"term":{"abstract":"war"}},{"term":{"abstract":"japanese"}},{"range":{"abstract_len":{"gt":500}}}],"must_not":{"term":{"abstract":"America"}},"should":[{"term":{"abstract":"chinese"}},{"term":{"abstract":"china"}}]}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t readwrite -e redisearch -f ./enwiki-latest-abstract.xml -c 20 -d 60 -q "@abstract:(war japanese -America (chinese|china)) @abstract_len:[500 +inf]" -a 127.0.0.1:6379

Results

Engine

Average write QPS

Average write latency (ms)

Average QPS

Average query latency (ms)

TairSearch

9,589.18

2.085

10,504.31

1.903

RediSearch

5,284.01

3.784

5,283.96

3.784

Summary

TairSearch uses multi-core parallel computing technology and inverted indexes designed specifically for text search to deliver high throughput and low latency. Additionally, TairSearch uses a dedicated data structure for document compression to reduce memory usage and save costs without compromising read and write performance.