This topic describes how to use the TPC Benchmark-H (TPC-H) to test the online analytical processing (OLAP) performance of a PolarDB for MySQL 8.0 cluster of Cluster Edition. To evaluate the performance of your database system, you can test and compare the capabilities of databases by following the instructions described in this topic.
Usage notes
In this example, a test based on the TPC-H benchmark is implemented, but it does not meet all the requirements of the TPC-H benchmark test. Therefore, the test results may not match the published results of the TPC-H benchmark test.
Test environment
The Elastic Compute Service (ECS) instance and PolarDB cluster that are used in the test are deployed in the same zone of the same region.
The network type is Virtual Private Cloud (VPC).
NoteMake sure that the ECS instance and PolarDB cluster are in the same VPC.
The following list describes the information of the PolarDB cluster that is used in the test:
The specifications are 32 cores and 256 GB of memory (polar.mysql.x8.4xlarge).
The version is MySQL 8.0.1 or MySQL 8.0.2.
The edition is Cluster and the compute node type is Dedicated.
The cluster consists of a primary node and a read-only node.
The primary endpoint is used to connect to the cluster in the test. For more information about how to view the primary endpoint of a PolarDB cluster, see Manage the endpoints of a cluster.
The following list describes the information of the ECS instance that is used in the test:
The instance type is ecs.c5.4xlarge.
An ultra disk of 1,000 GB is attached to the instance.
The image of the instance is 64-bit CentOS 7.0.
Test tool
TPC-H is a commonly used benchmark that is developed and released by the Transaction Processing Performance Council (TPC) to evaluate the analytical query capabilities of databases. The TPC-H benchmark contains eight tables and 22 complex SQL statements. Most of the queries contain JOIN clauses on several tables, subqueries, and GROUP BY clauses.
Install TPC-H tools
The commands in this topic must be executed by users with root permissions.
The TPC-H used in this topic is TPC-H_Tools_v2.18.0. You must register the TPC-H before you can download it.
To install TPC-H on the ECS instance, perform the following steps:
Register and download the TPC-H package. Then, upload the TPC-H package to the ECS instance. For more information, see Upload on-premises files to ECS instances.
Run the following command to decompress the uploaded TPC-H package: In the command, the
dbgen.zip
package is specified as an example. Replace it with the actual package file name.unzip dbgen.zip
Go to the directory where the decompressed file is located. In the following command, the dbgen directory is specified as an example. Replace it with the actual directory name.
cd dbgen
Run the following command to copy the
makefile
file:cp makefile.suite makefile
Run the following command to install GCC:
If you are using the CentOS operating system, run the following command to install GCC:
sudo yum install gcc
If you are using the Ubuntu operating system, run the following command to install GCC:
sudo apt install gcc
Modify parameters in the
makefile
file, such asCC
,DATABASE
,MACHINE
, andWORKLOAD
.Open the
makefile
file.vim makefile
Press the
i
key and then modify the CC, DATABASE, MACHINE, and WORKLOAD parameters.################ ## CHANGE NAME OF ANSI COMPILER HERE ################ CC= gcc # Current values for DATABASE are: INFORMIX, DB2, ORACLE, # SQLSERVER, SYBASE, TDAT (Teradata) # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS, # SGI, SUN, U2200, VMS, LINUX, WIN32 # Current values for WORKLOAD are: TPCH DATABASE= MYSQL MACHINE = LINUX WORKLOAD = TPCH
Press the Esc key, enter
:wq!
, and then press the Enter key to save your changes and exit the edit mode.
Modify the
tpcd.h
file and add a new macro definition.Open the
tpcd.h
file.vim tpcd.h
Press the
i
key to add the following macro definition:#ifdef MYSQL #define GEN_QUERY_PLAN "EXPLAIN PLAN" #define START_TRAN "START TRANSACTION" #define END_TRAN "COMMIT" #define SET_OUTPUT "" #define SET_ROWCOUNT "limit %d;\n" #define SET_DBASE "use %s;\n" #endif
Press the Esc key, enter
:wq
, and then press the Enter key to save your changes and exit the edit mode.
Compile the file.
make
After the file is compiled, the following executable files are generated in the directory:
dbgen
: the tool that is used to generate data. If you use the InfiniDB test script, you must use this tool to generate data for TPC-H.qgen
: the tool that is used to generate SQL statements. This tool generates initial statements for the test. The queries generated by using different seeds are different from each other. To ensure that test results are repeatable, use the 22 complex SQL statements of TPC-H. You can click here to download the 22 statements.
Use dbgen to generate test data.
./dbgen -s 100
The
-s
parameter specifies the number of repositories that are used to generate data.Use qgen to generate a query.
NoteTo ensure that test results are repeatable, skip this step and use the 22 complex SQL statements of TPC-H. You can click here to download the 22 statements.
Copy
qgen
anddists.dss
into the queries directory.cp qgen queries cp dists.dss queries
Use the following script to generate a query:
#!/usr/bin/bash for i in {1..22} do ./qgen -d $i -s 100 > db"$i".sql done
Test procedure
Connect to the PolarDB cluster from the ECS instance. For more information, see Connect to a cluster.
Create a database.
create database tpch100g
Create a table.
source ./dss.ddl
Notedss.ddl
is in the dbgen directory of the TPC-H tools package.Load data.
Create a file named
load.ddl
on the ECS instance.Touch load.ddl
Open the
load.ddl
file.vim load.ddl
Add and save the following script:
load data local INFILE 'customer.tbl' INTO TABLE customer FIELDS TERMINATED BY '|'; load data local INFILE 'region.tbl' INTO TABLE region FIELDS TERMINATED BY '|'; load data local INFILE 'nation.tbl' INTO TABLE nation FIELDS TERMINATED BY '|'; load data local INFILE 'supplier.tbl' INTO TABLE supplier FIELDS TERMINATED BY '|'; load data local INFILE 'part.tbl' INTO TABLE part FIELDS TERMINATED BY '|'; load data local INFILE 'partsupp.tbl' INTO TABLE partsupp FIELDS TERMINATED BY '|'; load data local INFILE 'orders.tbl' INTO TABLE orders FIELDS TERMINATED BY '|'; load data local INFILE 'lineitem.tbl' INTO TABLE lineitem FIELDS TERMINATED BY '|';
Connect to the PolarDB cluster to load the data from the
load.ddl
file.source ./load.ddl
Create primary keys and foreign keys.
source ./dss.ri
In the following example, the
tpch100g
database is used. Replace the content in thedss.ri
file of the TPC-H tools package with the following content:use TPCH100G; -- ALTER TABLE REGION DROP PRIMARY KEY; -- ALTER TABLE NATION DROP PRIMARY KEY; -- ALTER TABLE PART DROP PRIMARY KEY; -- ALTER TABLE SUPPLIER DROP PRIMARY KEY; -- ALTER TABLE PARTSUPP DROP PRIMARY KEY; -- ALTER TABLE ORDERS DROP PRIMARY KEY; -- ALTER TABLE LINEITEM DROP PRIMARY KEY; -- ALTER TABLE CUSTOMER DROP PRIMARY KEY; -- For table REGION ALTER TABLE REGION ADD PRIMARY KEY (R_REGIONKEY); -- For table NATION ALTER TABLE NATION ADD PRIMARY KEY (N_NATIONKEY); ALTER TABLE NATION ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references REGION(R_REGIONKEY); COMMIT WORK; -- For table PART ALTER TABLE PART ADD PRIMARY KEY (P_PARTKEY); COMMIT WORK; -- For table SUPPLIER ALTER TABLE SUPPLIER ADD PRIMARY KEY (S_SUPPKEY); ALTER TABLE SUPPLIER ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references NATION(N_NATIONKEY); COMMIT WORK; -- For table PARTSUPP ALTER TABLE PARTSUPP ADD PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY); COMMIT WORK; -- For table CUSTOMER ALTER TABLE CUSTOMER ADD PRIMARY KEY (C_CUSTKEY); ALTER TABLE CUSTOMER ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references NATION(N_NATIONKEY); COMMIT WORK; -- For table LINEITEM ALTER TABLE LINEITEM ADD PRIMARY KEY (L_ORDERKEY,L_LINENUMBER); COMMIT WORK; -- For table ORDERS ALTER TABLE ORDERS ADD PRIMARY KEY (O_ORDERKEY); COMMIT WORK; -- For table PARTSUPP ALTER TABLE PARTSUPP ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references SUPPLIER(S_SUPPKEY); COMMIT WORK; ALTER TABLE PARTSUPP ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references PART(P_PARTKEY); COMMIT WORK; -- For table ORDERS ALTER TABLE ORDERS ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references CUSTOMER(C_CUSTKEY); COMMIT WORK; -- For table LINEITEM ALTER TABLE LINEITEM ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY) references ORDERS(O_ORDERKEY); COMMIT WORK; ALTER TABLE LINEITEM ADD FOREIGN KEY LINEITEM_FK2 (L_PARTKEY,L_SUPPKEY) references PARTSUPP(PS_PARTKEY,PS_SUPPKEY); COMMIT WORK;
(Optional)Create indexes.
#!/usr/bin/bash host=$1 port=$2 user=$3 password=$4 db=$5 sqls=("create index i_s_nationkey on supplier (s_nationkey);" "create index i_ps_partkey on partsupp (ps_partkey);" "create index i_ps_suppkey on partsupp (ps_suppkey);" "create index i_c_nationkey on customer (c_nationkey);" "create index i_o_custkey on orders (o_custkey);" "create index i_o_orderdate on orders (o_orderdate);" "create index i_l_orderkey on lineitem (l_orderkey);" "create index i_l_partkey on lineitem (l_partkey);" "create index i_l_suppkey on lineitem (l_suppkey);" "create index i_l_partkey_suppkey on lineitem (l_partkey, l_suppkey);" "create index i_l_shipdate on lineitem (l_shipdate);" "create index i_l_commitdate on lineitem (l_commitdate);" "create index i_l_receiptdate on lineitem (l_receiptdate);" "create index i_n_regionkey on nation (n_regionkey);" "analyze table supplier" "analyze table part" "analyze table partsupp" "analyze table customer" "analyze table orders" "analyze table lineitem" "analyze table nation" "analyze table region") for sql in "${sqls[@]}" do mysql -h$host -P$port -u$user -p$password -D$db -e "$sql" done
NoteTo measure the performance improvement brought by parallel queries, execute the following query to preload the indexes to the memory pool:
#!/bin/bash host=$1 port=$2 user=$3 password=$4 dbname=$5 MYSQL="mysql -h$host -P$port -u$user -p$password -D$dbname" if [ -z ${dbname} ]; then echo "dbname not defined." exit 1 fi table_indexes=( "supplier PRIMARY" "supplier i_s_nationkey" "part PRIMARY" "partsupp PRIMARY" "partsupp i_ps_partkey" "partsupp i_ps_suppkey" "customer PRIMARY" "customer i_c_nationkey" "orders PRIMARY" "orders i_o_custkey" "orders i_o_orderdate" "lineitem PRIMARY" "lineitem i_l_orderkey" "lineitem i_l_partkey" "lineitem i_l_suppkey" "lineitem i_l_partkey_suppkey" "lineitem i_l_shipdate" "lineitem i_l_commitdate" "lineitem i_l_receiptdate" "nation i_n_regionkey" "nation PRIMARY" "region PRIMARY" ) for table_index in "${table_indexes[@]}" do ti=($table_index) table=${ti[0]} index=${ti[1]} SQL="select count(*) from ${table} force index(${index})" echo "$MYSQL -e '$SQL'" $MYSQL -e "$SQL" done
Execute queries.
#!/usr/bin/env bash host=$1 port=$2 user=$3 password=$4 database=$5 resfile=$6 echo "start test run at"`date "+%Y-%m-%d %H:%M:%S"`|tee -a ${resfile}.out for (( i=1; i<=22;i=i+1 )) do queryfile="Q"${i}".sql" start_time=`date "+%s.%N"` echo "run query ${i}"|tee -a ${resfile}.out mysql -h ${host} -P${port} -u${user} -p${password} $database -e" source $queryfile;" |tee -a ${resfile}.out end_time=`date "+%s.%N"` start_s=${start_time%.*} start_nanos=${start_time#*.} end_s=${end_time%.*} end_nanos=${end_time#*.} if [ "$end_nanos" -lt "$start_nanos" ];then end_s=$(( 10#$end_s -1 )) end_nanos=$(( 10#$end_nanos + 10 ** 9)) fi time=$(( 10#$end_s - 10#$start_s )).`printf "%03d\n" $(( (10#$end_nanos - 10#$start_nanos)/10**6 ))` echo ${queryfile} "the "${j}" run cost "${time}" second start at"`date -d @$start_time "+%Y-%m-%d %H:%M:%S"`" stop at"`date -d @$end_time "+%Y-%m-%d %H:%M:%S"` >> ${resfile}.time done
Test results
For more information about parallel query performance test results, see Performance test results in parallel query scenarios.