All Products
Search
Document Center

Tair (Redis® OSS-Compatible):TairSearch performance whitepaper

Last Updated:Nov 21, 2024

This topic describes the methods that are used to test the write and query performance of TairSearch and RediSearch, and provides the test results.

TairSearch is an in-house full-text search data structure of Tair. TairSearch uses query syntax that is similar to that of Elasticsearch to implement effective full-text search. For more information, see Search.

Test description

Client test environment

Item

Description

Host of the client

Elastic Compute Service (ECS) instance of the ecs.g7.8xlarge type. For more information, see Overview of instance families.

Region and zone

Zone K in the China (Hangzhou) region

Operating system

CentOS 7.9 64-bit

Database test environment

A Tair database and a Redis database are hosted on the same ECS instance. The following table describes the databases.

Table 1. Self-managed Tair database

Item

Description

Tair version

Redis 5.0-compatible DRAM-based instance of version 5.0.30

Number of I/O threads

4

CPU

6 vCPUs. Sample command: taskset -c 1-6 ./src/redis-server redis.conf.

Table 2. Self-managed Redis database

Item

Description

Redis version

7.0.10

RediSearch version

2.6.6. The CONCURRENT_WRITE_MODE parameter is set to true.

RedisJSON version

2.4.6

Number of I/O threads

4

CPU

6 vCPUs. Sample command: taskset -c 1-6 ./src/redis-server redis.conf.

Test data

The test data is a collection of articles in Chinese and a collection of articles in English from Wikimedia. For more information, see Index of /zhwiki/latest/ and Index of /enwiki/latest/.

Examples:

{
    "id":"History_of_Pakistan",
    "title":"History of Pakistan",
    "url":"https://en.wikipedia.org/wiki/History_of_Pakistan",
    "abstract":"The history of Pakistan for the period preceding the country's independence in 1947Pakistan was created as the Dominion of Pakistan on 14 August 1947 after the end of British rule in, and partition of British India. is shared with that of Afghanistan, India."
}
{
    "id":"Wikipedia:哲学",
    "title":"Wikipedia:哲学",
    "url":"https://zh.wikipedia.org/wiki/%E5%93%B2%E5%AD%A6",
    "abstract":"哲学()是研究普遍的、基本问题的学科,包括存在、知识、价值、理智、心灵、语言等领域。哲学与其他学科不同之处在於哲学有独特之思考方式,例如批判的方式、通常是系统化的方法,并以理性论证为基础。"
}

Test tool

Download the binary executable file that matches your operating system. The file for Darwin is named TairSearchBench.Darwin, the file for Linux is named TairSearchBench.Linux, and the file for Windows is named TairSearchBench.Windows.

In this example, TairSearchBench.Linux is used. Run the ./TairSearchBench.Linux --help command to check how to use the tool.

Usage of ./TairSearchBench.linux:
  -a string
        The address(ip:port) of network to connect 
        # The endpoint of the instance. 
  -c int
        Benchmark concurrency (default 30)
        # The number of tests that can be run concurrently. Default value: 30. 
  -d uint
        Specify the number of seconds for the benchmark (default 30)
        # The duration of the test. When the duration ends, the test is terminated. Default value: 30. Unit: seconds. 
  -e string
        The engine backend to run [tairsearch/redisearch]
        # Specify TairSearch or RediSearch as the engine that the instance runs. 
  -f string
        Input file to ingest data from (wikipedia abstracts)
        # The path of the execution data file. 
  -h string
        Print usage (default "help")
        # Display the usage of the tool. 
  -j string
        Specify the big json file to write
        # Specify the path of the JSON file to be written. 
  -n uint
        Specify the number of times to benchmark (default 100000)
        # The total number of operations to perform for a test. Default value: 100000. 
  -o int
        Overwrite the doc (We will write the document with the same document id)
        # Specify whether to overwrite the original document. Valid values: 1 (true) and 0 (false). Default value: 0. 
  -p string
        The password of redis to connect
        # The password of the instance. 
  -q string
        Search query string to benchmark
        # The query statement that is used to run tests. 
  -s uint
        Specify the compress threshold for tairsearch (default 10000000000)
        # Specify the compression threshold for TairSearch. If the size of a document exceeds the threshold, the document is compressed. Unit: bytes. Default value: 10000000000 (10 KB). 
  -t string
        Specify the type of benchmark [write/search/readwrite]
        # Set the test type to write, search, or readwrite. 
  -z string
        Specify the analyzer to use for query (default "standard")
        # Specify the analyzer for the query. Default value: standard.

Before you perform testing, allocate 20 vCPUs to the ECS instance. Sample command: taskset -c 10-30 ./TairSearchBench.linux.

Preparations

Create a schema (index). Examples:

  • TairSearch

    {
        "settings": {
            "compress_doc": {
                "size": "user-defined compression threshold",
                "enable": true
            }
        },
        "mappings": {
            "properties": {
                "id":       {"type": "keyword"},
                "url":      {"type": "keyword", "index": false},
                "title":    {"type": "text", "analyzer": "user-defined analyzer"},
                "abstract": {"type": "text", "analyzer": "user-defined analyzer"},
                "url_len":  {"type": "integer"},
                "abstract_len":  {"type": "integer"},
                "title_len":  {"type": "integer"}
            }
        }
    }
  • RediSearch

    \\ SCHEMA
    $.id AS id TEXT
    $.url AS url TEXT NOINDEX
    $.title AS title TEXT
    $.abstract AS abstract TEXT
    $.abstract_len AS abstract_len NUMERIC
    $.url_len AS url_len NUMERIC
    $.title_len AS title_len NUMERIC
    
    \\ If the test data is documents in Chinese, add LANGUAGE CHINESE to the preceding code.

Test commands and test results

In the following tests, one million documents are written for each write test. One million queries are performed on one million documents for each query test. Each test that combines write and query operations is configured to run for 60 seconds.

Write data in English

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e tairsearch -f ./enwiki-latest-abstract.xml -c 20 -n 1000000 -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e redisearch -f ./enwiki-latest-abstract.xml -c 20 -n 1000000 -a 127.0.0.1:6379

Test results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

Memory used (GB)

TairSearch

22615.15

0.874

1.735

1.39

RediSearch

18295.10

1.092

2.352

1.67

Write data in Chinese

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e tairsearch -f ./zhwiki-latest-abstract.xml -c 20 -n 1000000 -a 127.0.0.1:6379 -z jieba
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e redisearch -f ./zhwiki-latest-abstract.xml -c 20 -n 1000000 -a 127.0.0.1:6379 -z chinese

Test results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

Memory used (GB)

TairSearch

13980.41

1.427

3.275

1.87

RediSearch

10924.40

1.830

3.857

1.83

Note

TairSearch has a higher memory usage than RediSearch because the jieba analyzer is used and more fine-grained tokens are generated.

Overwrite data in English

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e tairsearch -f ./enwiki-latest-abstract.xml -c 20 -n 1000000 -o 1 -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e redisearch -f ./enwiki-latest-abstract.xml -c 20 -n 1000000 -o 1 -a 127.0.0.1:6379

Test results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

Memory used (GB)

TairSearch

9775.03

2.041

3.974

0.0002

RediSearch

22239.67

0.898

1.38

0.165

Note

When you perform an overwrite operation, RediSearch marks the original document for later deletion. This causes additional memory usage. In comparison, TairSearch deletes the original document in real time.

Overwrite data in Chinese

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e tairsearch -f ./zhwiki-latest-abstract.xml -c 20 -n 1000000 -o 1 -a 127.0.0.1:6379 -z jieba
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t write -e redisearch -f ./zhwiki-latest-abstract.xml -c 20 -n 1000000 -o 1 -a 127.0.0.1:6379

Test results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

Memory used (GB)

TairSearch

6194.15

3.206

6.456

0.025 (including the memory used by the jieba analyzer dictionary)

RediSearch

25096.18

0.796

1.338

0.671

Note

When you perform an overwrite operation, RediSearch marks the original document for later deletion. This causes additional memory usage. In comparison, TairSearch deletes the original document in real time.

Use a term statement to query data in English

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 1000000 -q '{"query":{"term":{"abstract":"hello"}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 1000000 -q "@abstract:hello" -a 127.0.0.1:6379

Test results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

45501.13

0.437

0.563

RediSearch

28513.87

0.700

0.833

Use a term statement to query data in Chinese

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 1000000 -q '{"query":{"term":{"abstract":"你好"}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 1000000 -q "@abstract:你好" -a 127.0.0.1:6379

Test results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

40670.47

0.489

0.635

RediSearch

24437.48

0.817

1.331

Use a match statement to query data in English

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 1000000 -q '{"query":{"match":{"abstract":{"operator":"and","query":"chinese history"}}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 1000000 -q "@abstract:chinese history" -a 127.0.0.1:6379

Test results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

24548.94

0.812

0.971

RediSearch

2420.66

8.261

8.523

Use a match statement to query data in Chinese

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 100000 -q '{"query":{"match":{"abstract":{"operator":"and","query":"中国的历史"}}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 100000 -q "@abstract:中国的历史" -a 127.0.0.1:6379  -analyzer jieba

Test results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

6601.05

3.027

3.669

RediSearch

889.37

22.486

22.985

Use a bool statement to query data in English

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 100000 -q '{"query":{"bool":{"must":[{"term":{"abstract":"war"}},{"term":{"abstract":"japanese"}},{"range":{"abstract_len":{"gt":500}}}],"must_not":{"term":{"abstract":"America"}},"should":[{"term":{"abstract":"chinese"}},{"term":{"abstract":"china"}}]}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 100000 -q "@abstract:(war japanese -America (chinese|china)) @abstract_len:[500 +inf]"  -a 127.0.0.1:6379

Test results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

4554.22

4.388

5.702

RediSearch

1124.08

17.791

18.444

Use a bool statement to query data in Chinese

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 100000 -q '{"query":{"bool":{"must":[{"term":{"abstract":"战争"}},{"term":{"abstract":"日本"}},{"range":{"abstract_len":{"gt":500}}}],"must_not":{"term":{"abstract":"美国"}},"should":[{"term":{"abstract":"中国"}},{"term":{"abstract":"亚洲"}}]}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 1000000 -q "@abstract:(日本 -美国 (中国|亚洲)) @abstract_len:[500 +inf]"  -a 127.0.0.1:6379 -analyzer jieba

Test results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

2619.00

7.623

18.42

RediSearch

1199.76

16.669

17.064

Use a range statement to query data in English

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 1000000 -q '{"query":{"range":{"abstract_len":{"lte":420, "gte":400}}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 1000000 -q "@abstract_len:[400,420]" -a 127.0.0.1:6379

Test results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

2840.02

7.038

8.599

RediSearch

1307.02

15.300

16.817

Use a prefix statement to query data in English

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 1000000 -q '{"query":{"prefix":{"abstract":"happiness"}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 1000000 -q "@abstract:happiness*" -a 127.0.0.1:6379

Test results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

36491.10

0.545

0.688

RediSearch

25558.92

0.781

0.930

Use a prefix statement to query data in Chinese

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e tairsearch -c 20 -n 1000000 -q '{"query":{"prefix":{"abstract":"开心"}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t search -e redisearch -c 20 -n 1000000 -q "@abstract:开心*" -a 127.0.0.1:6379 -z chinese

Test results

Engine

QPS

Average latency (ms)

99th percentile latency (ms)

TairSearch

41308.71

0.481

0.638

RediSearch

27457.86

0.727

1.234

Write data and use a term statement to query data

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t readwrite -e tairsearch -f ./enwiki-latest-abstract.xml -c 20 -d 60 -q '{"query":{"term":{"abstract":"hello"}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t readwrite -e redisearch -f ./enwiki-latest-abstract.xml -c 20 -d 60 -q "@abstract:hello" -a 127.0.0.1:6379

Test results

Engine

Average write QPS

Average write latency (ms)

Average QPS

Average query latency (ms)

TairSearch

14699.77

1.359

16224.03

1.232

RediSearch

11386.75

1.755

11386.70

1.755

Write data and use a bool statement to query data

Commands

  • TairSearch

    taskset -c 10-30 ./TairSearchBench.linux -t readwrite -e tairsearch -f ./enwiki-latest-abstract.xml -c 20 -d 60 -q '{"query":{"bool":{"must":[{"term":{"abstract":"war"}},{"term":{"abstract":"japanese"}},{"range":{"abstract_len":{"gt":500}}}],"must_not":{"term":{"abstract":"America"}},"should":[{"term":{"abstract":"chinese"}},{"term":{"abstract":"china"}}]}}}' -a 127.0.0.1:6379
  • RediSearch

    taskset -c 10-30 ./TairSearchBench.linux -t readwrite -e redisearch -f ./enwiki-latest-abstract.xml -c 20 -d 60 -q "@abstract:(war japanese -America (chinese|china)) @abstract_len:[500 +inf]" -a 127.0.0.1:6379

Test results

Engine

Average write QPS

Average write latency (ms)

Average QPS

Average query latency (ms)

TairSearch

9589.18

2.085

10504.31

1.903

RediSearch

5284.01

3.784

5283.96

3.784

Summary

TairSearch uses multi-core parallel computing technology and inverted indexes designed specifically for text search to deliver high throughput and low latency. Additionally, TairSearch uses a dedicated data structure for document compression to reduce memory usage and save costs without compromising read and write performance.