MaxCompute has high performance advantages in the industry and is suitable for queries of terabytes, petabytes, or even exabytes of data. This topic describes how to perform a big data benchmark TPC-DS test based on the public datasets and test tools that are provided by MaxCompute to verify the performance of MaxCompute.
Preparations
Prepare an environment.
Before you perform a TPC-DS test, activate MaxCompute and create a project. For more information, see Create a project.
Activate MaxCompute Query Acceleration (MCQA) for a subscription MaxCompute project. For more information, see MaxCompute Query Acceleration.
Prepare a test tool.
MaxCompute provides a TPC-DS automated performance test tool to help you quickly complete a TPC-DS test and automatically generate test results.
ImportantThe test tool can be used only in Linux in which a Java Development Kit (JDK) of 1.7 or later is installed.
You can click mc_tpcds_benchmark to download the package of the test tool and run the following command on the Linux server to decompress the package:
unzip mc_tpcds_benchmark.zip
The following code shows the directory structure of the decompressed file.
. |_t1c7039e3-2a1d-451b-bfda-d14c49016243-tpc-ds-tool.zip |_config |_init_tools.sh |_load_table.sh |_logs |_odps_clt |_patches |_pt.sh |_queries_1 |_queries_1.quality |_queries_10 |_queries_100 |_queries_1000 |_queries_10000 |_queries_100000 |_querygen.sh |_results |_run_stream.sh |_run_stream.sh.offline |_sqls |_start_session_only.sh |_start_session.sql |_start_session.sql_tmp |_tools_file |_tt.sh |_v2.10.1rc3
Obtain a test dataset.
MaxCompute provides public datasets. You do not need to prepare test data. All test data is stored in the public project
BIGDATA_PUBLIC_DATASET
of MaxCompute. For more information, see Overview.TPC-DS test datasets are divided into 10 GB, 100 GB, 1 TB, and 10 TB datasets based on the data size. The following table describes the datasets.
Type
Description
Dataset name
Schema name
TPC-DS
TPC-DS is a decision support benchmark that models several generally applicable aspects of a decision support system, including queries and data maintenance. TPC-DS enables emerging technologies, such as big data systems, to perform benchmark tests.
TPC-DS 10-GB performance test dataset
TPC-DS 100-GB performance test dataset
TPC-DS 1-TB performance test dataset
TPC-DS 10-TB performance test dataset
tpcds_10g
tpcds_100g
tpcds_1t
tpcds_10t
Procedure
Modify the configuration file of the test tool
Go to the mc_tpcds_benchmark directory of the decompressed package of the test tool and modify the config file. The following table describes the configuration items that you need to modify.
Configuration item | Description | Value |
ODPS_CLT_CMD | The absolute path of the executable file of the MaxCompute client. The client that is provided in the package is odps_clt in the working directory. You can modify the related configuration. For more information, see Install and configure the MaxCompute client. | Example: /xxxxx/mc_tpcds_benchmark/odps_clt/bin/odpscmd. |
PROJECT | The MaxCompute project that is used for the test. | Example: tpcds_test. |
SF | The data size of the TPC-DS test. Unit: GB. 1 indicates 1 GB. 1000 indicates 1 TB. You can change the value based on your test requirements. | Default value: 1000 |
SQL_FLAGS | The built-in flag parameters of MaxCompute. You do not need to modify the configuration of these parameters. |
|
Start the test
Run the following command in the mc_tpcds_benchmark directory to start the TPC-DS test:
nohup sh pt.sh > pt.log 2>&1 &
If the test is successful, a pt.log file is generated in the mc_tpcds_benchmark directory. You can run the following command to view the logs of the job:
tail -f pt.log
View the execution information about MaxCompute jobs
You can view the execution information about a job on the Jobs page in the MaxCompute console. For more information, see Manage jobs.
View test results
If the execution is successful, a test result file named console_test_result.csv is generated in the mc_tpcds_benchmark directory. You can view test results in the file, including the total test duration, the execution time of each query, and the related LogView information.