ApsaraDB for SelectDB is designed to provide high-performance and user-friendly data analytics services. SelectDB delivers high performance in scenarios such as wide table aggregation, multi-table join queries, and high-concurrency point queries. This topic describes how to use the TPC-DS benchmark to test the performance of complex queries in SelectDB within a Linux environment and provides the test results.
Overview
The Transaction Processing Performance Council Decision Support Benchmark (TPC-DS) is a decision support benchmark developed by the Transaction Processing Performance Council (TPC). It evaluates the performance of different systems in handling complex queries and large-scale data analytics.
TPC-DS is designed to simulate complex decision-making in real-world business scenarios. It tests system performance using a series of complex queries and data operations. These operations include joins, aggregations, sorting, filtering, and subqueries. The queries cover a range of scenarios from simple to complex, such as report generation, data mining, and online analytical processing (OLAP).
Preparations
Step 1: Prepare a target instance
Prepare a SelectDB instance.
If you already have a target instance, verify that its configuration meets the following requirements.
If you do not have a target instance, you can create one.
The instance used for this performance test must meet the following requirements.
The SelectDB Core version must be 4.1 or later.
If your target instance has a version earlier than 4.1, you must upgrade the instance. For more information, see Upgrade the kernel version.
Specifications must be 96 cores and 384 GB or higher. This test uses an instance with 96 cores and 384 GB.
Cluster cache space must be 1,200 GB or larger. This test uses 1,200 GB of cache space.
Set the
streaming_load_max_mbparameter to its maximum value.During the test, the tool uploads the test dataset to SelectDB using Stream Load. The data volume for this test exceeds the default Stream Load limit of 10,240 MB. You must change the BE parameter streaming_load_max_mb to the maximum value of 10,240,000 MB. For more information about how to change parameters, see Parameter settings.
Create a destination database for the test data.
If you already have a destination database, you can skip this step.
Connect to the instance. For more information, see Connect to an ApsaraDB for SelectDB instance using a MySQL client.
Create a database.
The destination database for this test is named test_db. The statement is as follows:
CREATE DATABASE test_db;
Step 2: Prepare a test server
The following installation scripts are for Linux servers. If your server uses a different operating system, you must modify the scripts as needed.
Usage notes
Take note of the following for your server:
If you plan to use Git to download the TPC-DS test tool on the server, you must enable public network access for the server.
New ECS instance: When you purchase the ECS instance, for the Public IP parameter, select Assign Public IPv4 Address.
Existing ECS instance without public network access: To enable public network access for an existing ECS instance, see Enable public network access.
The data files generated for this test dataset are approximately 1,000 GB. Ensure that the server has sufficient memory.
Create a target server.
If you already have a target server, you can skip this step.
If you do not have a target server, you can create a custom ECS instance. Select Alibaba Cloud Linux for the image.
Install the required tools for the test.
Install the MySQL client.
yum install mysqlInstall unzip.
yum install unzipInstall jq.
yum install jqInstall CURL.
yum install curl
(Optional) Install Git.
This test uses Git to download the TPC-DS tool. If you have already obtained the TPC-DS tool by other means and plan to upload it to the server manually, you can skip this step.
yum install git
Step 3: Ensure network connectivity
Ensure that the target server where the TPC-DS test tool is installed can communicate with the SelectDB instance.
Request a public endpoint for the SelectDB instance. For more information, see Apply for and release a public endpoint.
If the server on which you want to install the TPC-DS test tool is an Alibaba Cloud server and is in the same virtual private cloud (VPC) as the ApsaraDB for SelectDB instance, you can skip this step.
Add the IP address of the server where you want to install the TPC-DS test tool to the IP address whitelist of the ApsaraDB for SelectDB instance. For more information, see Configure an IP address whitelist.
Step 4: Understand the test dataset
Standard TPC-DS test datasets are often very different from actual business scenarios. Some tests also involve parameter tuning for the test dataset. Therefore, the results from a standard test dataset reflect database performance only in specific scenarios. We recommend that you use your actual business data for further testing.
In this test, 1,000 GB of data is generated by TPC-DS and imported into a SelectDB instance to evaluate the performance of the SelectDB instance. The following section describes the data tables in the 1,000 GB test dataset.
TPC-DS table name | Number of rows |
customer_demographics | 1,920,800 |
reason | 65 |
warehouse | 20 |
date_dim | 73,049 |
catalog_sales | 1,439,980,416 |
call_center | 42 |
inventory | 783,000,000 |
catalog_returns | 143,996,756 |
household_demographics | 7,200 |
customer_address | 6,000,000 |
income_band | 20 |
catalog_page | 30,000 |
item | 300,000 |
web_returns | 71,997,522 |
web_site | 54 |
promotion | 1,500 |
web_sales | 720,000,376 |
store | 1,002 |
web_page | 3,000 |
time_dim | 86,400 |
store_returns | 287,999,764 |
store_sales | 2,879,987,999 |
ship_mode | 20 |
customer | 12,000,000 |
Procedure
The following scripts are for Linux servers. If your server uses a different operating system, you must modify the scripts as needed.
Step 1: Log on to the target server
If your server is an Alibaba Cloud ECS instance, see Connect to an ECS instance for logon instructions.
For other types of servers, see their respective product documentation.
Step 2: Install the TPC-DS test tool
Download the tool.
This test uses Git to download the tool. The script is as follows:
git clone https://github.com/apache/doris.git && cd ./doris/tools/tpcds-toolsAlternatively, you can download the tool from tpcds-tools and then manually upload it to the target server.
Compilation tools
Run the following script to compile the tool.
sh bin/build-tpcds-tools.sh
Step 3: Generate the TPC-DS test dataset
The larger the data volume, the longer it takes to generate the data. The time required depends on server performance.
Run the script in the installation directory of the test tool to generate the test dataset.
The syntax is as follows:
sh bin/gen-tpcds-data.sh -s <yourAimDataNum>Parameters:
yourAimDataNum:
Description: The size of the data to generate using TPC-DS.
Unit: GB
This is a medium-scale test that requires generating a 1,000 GB (1 TB) test dataset. This step may take a long time. We recommend that you run this task in the background. The statement is as follows:
nohup sh bin/gen-tpcds-data.sh -s 1000 > gen-tpcds-data.log 2>&1 &The execution result is saved to the gen-tpcds-data.log file in the tool's installation directory. You can view this file to check whether the process ran correctly.
The test dataset is saved to the tpcds-data directory within the bin directory of the tool's installation directory. The data files have a .dat extension.
Step 4: Use a script to create test tables for SelectDB
Configure the SelectDB instance.
Before you run the table creation script, you must configure the SelectDB instance in the
doris-cluster.conffile. This file is located in thetpcds-tools/conf/directory of the test tool installation directory. The following is an example:# Any FE host export FE_HOST='selectdb-cn-******.selectdbfe.rds.aliyuncs.com' # The http_port from fe.conf export FE_HTTP_PORT=8080 # The query_port from fe.conf export FE_QUERY_PORT=9030 # Doris username export USER='admin' # Doris password export PASSWORD='******' # The database where the TPC-DS tables are located export DB='test_db'Parameters:
Parameter name
Parameter description
FE_HOST
The endpoint of the SelectDB instance.
You can obtain the VPC endpoint or public endpoint of the instance from the Network Information section of the instance details page in the SelectDB console.
FE_HTTP_PORT
The HTTP protocol port of the SelectDB instance.
The default port of the SelectDB instance is 8080.
You can obtain the HTTP protocol port of the instance from the Network Information section of the instance details page in the SelectDB console.
FE_QUERY_PORT
The MySQL protocol port of the SelectDB instance.
The default port of the SelectDB instance is 9030.
You can obtain the MySQL protocol port of the instance from the Network Information section of the instance details page in the SelectDB console.
USER
The account of the SelectDB instance.
After you create a SelectDB instance, the system automatically creates an admin account for the instance.
PASSWORD
The password of the SelectDB instance account.
If you set USER to the admin account but have forgotten the password, you can reset the admin password for the instance in the console.
DB
The name of the database in the SelectDB instance where you will import data.
Create tables.
In the installation directory of the test tool, run the following script to create the test tables. After you run the script, the tables described in Step 4: Understand the test dataset are created in the destination database of the SelectDB instance.
sh bin/create-tpcds-tables.sh -s 1000
Step 5: Import data into SelectDB
The larger the data volume, the longer it takes to import the data. The time required depends on server performance.
In the installation directory of the test tool, run the following script to import the TPC-DS test dataset into the SelectDB instance:
sh bin/load-tpcds-data.shThis is a medium-scale test, and you need to import the generated 1,000 GB (1 TB) test dataset into SelectDB. This step may take a long time. We recommend that you run this task in the background. The statement is as follows:
nohup sh bin/load-tpcds-data.sh > load-tpcds-data.log 2>&1 &The execution result is saved to the load-tpcds-data.log file in the tool's installation directory. You can view this file to check whether the process ran correctly.
Step 6: Test query performance
Test the performance of SQL queries in batches
ImportantThe larger the data volume, the longer the batch test takes. The time required depends on server performance.
You can run the TPC-DS test SQL scripts to execute the SQL queries in the test set in batches.
The syntax is as follows.
sh bin/run-tpcds-queries.sh -s <yourAimDataNum>Parameters:
yourAimDataNum: Specifies the scale of the dataset for the query. This value must be the same as the scale used to generate the data. For example, if you used
-s 1000to generate the data, you must also use-s 1000to run the queries.After the script is executed, the console window displays the performance of each SQL statement in the test set against SelectDB.
This medium-scale test queries a 1000 GB (1 TB) test dataset and may take a long time to complete. You can run this task in the background. The statement is as follows.
nohup sh bin/run-tpcds-queries.sh -s 1000 > run-tpcds-queries.log 2>&1 &The query results are saved to the run-tpcds-queries.log file in the tool installation directory. You can view this file to check the query status and review the test results. For the test results on the 1000 GB dataset, see Test results.
Test the performance of a single SQL query
You can also follow these steps to test the performance of a specific SQL statement in SelectDB:
Connect to the SelectDB instance. For more information, see Connect to an ApsaraDB for SelectDB instance using DMS.
Execute the target SQL statement. You can obtain the target SQL statement from TPCDS-Query-SQL.
NoteThe SQL statements in TPCDS-Query-SQL are the same as those executed in the batch test script.
Test results
The following TPC-DS 1000 GB query performance test results were obtained using a SelectDB version 4.1.1 instance configured with 96 cores, 384 GB of memory, and a 1200 GB cluster cache.
Query | TPC-DS 1000 GB (s) |
Q1 | 0.4 |
Q2 | 3.05 |
Q3 | 0.22 |
Q4 | 7.16 |
Q5 | 0.51 |
Q6 | 0.19 |
Q7 | 0.38 |
Q8 | 0.32 |
Q9 | 4.48 |
Q10 | 0.28 |
Q11 | 4.51 |
Q12 | 0.1 |
Q13 | 0.49 |
Q14_1 | 4.73 |
Q14 | 4.79 |
Q15 | 0.26 |
Q16 | 0.27 |
Q17 | 0.9 |
Q18 | 0.48 |
Q19 | 0.18 |
Q20 | 0.1 |
Q21 | 0.08 |
Q22 | 0.98 |
Q23_1 | 12.94 |
Q23 | 12.78 |
Q24_1 | 2.29 |
Q24 | 2.32 |
Q25 | 0.63 |
Q26 | 0.16 |
Q27 | 0.39 |
Q28 | 4.12 |
Q29 | 0.59 |
Q30 | 0.2 |
Q31 | 1.19 |
Q32 | 0.06 |
Q33 | 0.28 |
Q34 | 0.51 |
Q35 | 0.88 |
Q36 | 0.34 |
Q37 | 0.07 |
Q38 | 4.09 |
Q39_1 | 0.27 |
Q39 | 0.3 |
Q40 | 0.12 |
Q41 | 0.04 |
Q42 | 0.1 |
Q43 | 0.88 |
Q44 | 1.34 |
Q45 | 0.2 |
Q46 | 0.91 |
Q47 | 1.71 |
Q48 | 0.45 |
Q49 | 0.77 |
Q50 | 0.75 |
Q51 | 4.32 |
Q52 | 0.1 |
Q53 | 0.2 |
Q54 | 0.94 |
Q55 | 0.1 |
Q56 | 0.25 |
Q57 | 0.96 |
Q58 | 0.22 |
Q59 | 6 |
Q60 | 0.29 |
Q61 | 0.47 |
Q62 | 0.77 |
Q63 | 0.18 |
Q64 | 0.99 |
Q65 | 3.42 |
Q66 | 0.31 |
Q67 | 17.92 |
Q68 | 0.79 |
Q69 | 0.52 |
Q70 | 1.69 |
Q71 | 0.39 |
Q72 | 2.73 |
Q73 | 0.44 |
Q74 | 3.97 |
Q75 | 3.55 |
Q76 | 0.99 |
Q77 | 0.27 |
Q78 | 11.98 |
Q79 | 2.06 |
Q80 | 0.49 |
Q81 | 0.36 |
Q82 | 0.19 |
Q83 | 0.17 |
Q84 | 0.11 |
Q85 | 0.48 |
Q86 | 0.41 |
Q87 | 4.79 |
Q88 | 3.43 |
Q89 | 0.29 |
Q90 | 0.13 |
Q91 | 0.1 |
Q92 | 0.05 |
Q93 | 1.04 |
Q94 | 0.2 |
Q95 | 0.27 |
Q96 | 0.43 |
Q97 | 3.39 |
Q98 | 0.17 |
Q99 | 1.59 |
Total | 171.45 |