This article provides a general overview of using Terraform to achieve the goal. If you prefer using a graphical interface, you can refer to the following article:
Starter Guide | AnalyticDB for PostgreSQL TPC-H Benchmark Testing in One Click
As stated in the TPC Benchmark™ H (TPC-H) specification:
“TPC-H is a decision support benchmark. It consists of a suite of business-oriented ad hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.”
For more information, see TPC-H specifications.
This implementation of TPC-H is derived from the TPC-H Benchmark and is not comparable to published TPC-H Benchmark results, as this implementation does not comply with all the requirements of the TPC-H Benchmark.
This is the ER(Entity Relationship) diagram of 8 tables in TPC-H.
(source: TPC Benchmark H Standard Specification)
In TPC-H, SF (Scale Factor) is used to describe the amount of data, and 1 SF corresponds to 1GB unit. 100 SF is 100GB. The data volume corresponding to 1 SF is only the total data volume of the 8 tables, excluding the space occupation such as indexes, and more space needs to be reserved when preparing data. The data volume of each table under the 100GB data set is as follows:
Table Name | Row Count |
---|---|
customer | 15,000,000 |
lineitem | 600,037,902 |
nation | 25 |
orders | 150,000,000 |
part | 20,000,000 |
partsupp | 80,000,000 |
region | 5 |
supplier | 1,000,000 |
Run the terraform script to initialize the resources (in this tutorial, we use ECS and AnalyticDB PostgreSQL. OSS bucket will also be used for storing big TPC-H data set, and we will manually create the bucket later). Please specify the necessary information and region to deploy.
After the Terraform script execution finished, the ECS and AnalyticDB PostgreSQL instance information are listed as below.
Please log on to ECS with ECS EIP
.
ssh root@<ECS_EIP>
Initialize and mount the data disk.
fdisk -u /dev/vdb
There will be some promote for the configuration, please follow the guide as shown in the image below.
Then input the following commands to finish the data disk mount.
fdisk -lu /dev/vdb
mkfs -t ext4 /dev/vdb1
cp /etc/fstab /etc/fstab.bak
echo `blkid /dev/vdb1 | awk '{print $2}' | sed 's/\"//g'` /mnt ext4 defaults 0 0 >> /etc/fstab
mount /dev/vdb1 /mnt
df -h
Install GIT, clone this github project and generate TPC-H data set. https://github.com/alibabacloud-howto/solution-adbpg-labs/blob/master/benchmark-tpc-h/data_gen_100gb.sh this file will generate 100GB data set. If you want to generate other size, please modify accordingly.
yum install -y git
cd /mnt
git clone https://github.com/alibabacloud-howto/solution-adbpg-labs.git
sh /mnt/solution-adbpg-labs/benchmark-tpc-h/data_gen_100gb.sh
Since we configured to generate the TPC-H data in 8 partitions in parallel, when input top
command, there shows 8 dbgen
processes generating data.
It will take for a while for these dbgen
processes to finish the data generation. When dbgen
processes disappear in the top
view, then the data generation finish successfully.
Run the command df -h
, it shows that 100+G Used
under /mnt
, which is the size of generated TPC-H data set in 100 SF.
Then create bucket in OSS for TPC-H data set.
Upload TPC-H data files to OSS bucket for parallel loading to AnalyticDB PostgreSQL later. Please update the parameters accordingly in the file https://github.com/alibabacloud-howto/solution-adbpg-labs/blob/master/benchmark-tpc-h/upload_tpch_oss.sh.
OSS_ENDPOINT
: the OSS endpoint of the bucket created for TPC-H data setOSS_BUCKET
: the bucket created for TPC-H data setAK_ID
: your Alibaba Cloud account access keyAK_SECRET
: your Alibaba Cloud account access secretThen run the following commands:
cd /mnt
wget http://gosspublic.alicdn.com/ossutil/1.7.3/ossutil64
chmod 755 ossutil64
sh /mnt/solution-adbpg-labs/benchmark-tpc-h/upload_tpch_oss.sh
After the script finished, it will show 8 folders in OSS bucket for 8 tables correspondingly.
Create user account in AnalyticDB PostgreSQL:
adbpg
N1cetest
Download and setup AnalyticDB for PostgreSQL client.
Note:
Check if the dependency library is missing by running rpm -q compat-openssl10
. If the dependency compat-openssl10 is not missing, the execution can proceed. If compat-openssl10 is missing, you need to install the library by executing yum install -y compat-openssl10
.
cd /mnt
wget http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/181125/cn_zh/1598426198114/adbpg_client_package.el7.x86_64.tar.gz
tar -xzvf adbpg_client_package.el7.x86_64.tar.gz
cd /mnt/adbpg_client_package/bin
vim ~/.pgpass
Input the following line in ~/.pgpass
file, <AnalyticDB PostgreSQL connection string>
is the connection string of the AnalyticDB PostgreSQL cluster.
Note:
Regarding .pgpass
Most access to the database, including the psql program, is done through the libpq library. This library includes a feature that allows you to place the passwords needed to connect as a user in a file called .pgpass (or the file referenced by PGPASSFILE).
The main purpose of this file is for password-free login.
<AnalyticDB PostgreSQL connection string>:5432:adbpg:adbpg:N1cetest
Then run the commands to create TPC-H tables.
chmod 0600 ~/.pgpass
cd /mnt/adbpg_client_package/bin
./psql -h<AnalyticDB PostgreSQL connection string> -Uadbpg adbpg -f /mnt/solution-adbpg-labs/benchmark-tpc-h/tpch-ddl.sql
Load TPC-H data set from OSS into AnalyticDB PostgreSQL. Please update the parameters accordingly in the file https://github.com/alibabacloud-howto/solution-adbpg-labs/blob/master/benchmark-tpc-h/load_tpch_oss_data.sql before executing the following commands.
oss://adbpg-tpch-bechmark-hongkong
: change to your target TPC-H bucket accordingly<ACCESS KEY>
: your Alibaba Cloud account access key<ACCESS SECRET>
: your Alibaba Cloud account access secretoss-cn-hongkong-internal.aliyuncs.com
: change to the endpoint of your target TPC-H bucket accordinglycd /mnt/adbpg_client_package/bin
./psql -h<AnalyticDB PostgreSQL connection string> -Uadbpg adbpg -f /mnt/solution-adbpg-labs/benchmark-tpc-h/load_tpch_oss_data.sql
After loading finished, run the SELECT COUNT(*)
to verify the row count in 8 tables.
And please also run ANALYZE
to collect statistics for tables for optimizer to generate better query execution plan.
Please update the parameters accordingly in the file https://github.com/alibabacloud-howto/solution-adbpg-labs/blob/master/benchmark-tpc-h/query.sh before execution.
ADB_PG_URL
: AnalyticDB PostgreSQL cluster connection stringADB_PG_USER
: AnalyticDB PostgreSQL cluster account user name (no need to change if you follow this guide to use adbpg
)sh /mnt/solution-adbpg-labs/benchmark-tpc-h/query.sh
All the TPC-H queries are located in https://github.com/alibabacloud-howto/solution-adbpg-labs/tree/master/benchmark-tpc-h/tpch_query.
Deploy and Run Apache OFBiz on Alibaba Cloud with ApsaraDB RDS for MySQL
ApsaraDB - May 20, 2024
Alibaba Cloud New Products - August 10, 2020
ApsaraDB - October 29, 2024
ApsaraDB - October 20, 2020
ApsaraDB - March 12, 2020
Alibaba Clouder - September 28, 2020
It's correct. There is no need to define a foreign key, index, etc., because it uses similar names in DDL for the columns that refer to the same value. For example, 'NATIONKEY' is configured as 'c_nationkey' in the CUSTOMER table and as 'n_nationkey' in the NATION table. Then when running the TPC query workload (DML), it uses a WHERE condition like 'WHERE c_nationkey = n_nationkey' to filter.
An online MPP warehousing service based on the Greenplum Database open source program
Learn MoreAlibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAnalyticDB for MySQL is a real-time data warehousing service that can process petabytes of data with high concurrency and low latency.
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreMore Posts by ApsaraDB
5109060302059819 August 12, 2022 at 11:01 am
You are not created the relationships after tables creation and data loading. Is it correct?
5126080747602209 March 13, 2024 at 8:26 am
It's correct. There is no need to define a foreign key, index, etc., because it uses similar names in DDL for the columns that refer to the same value. For example, 'NATIONKEY' is configured as 'c_nationkey' in the CUSTOMER table and as 'n_nationkey' in the NATION table. Then when running the TPC query workload (DML), it uses a WHERE condition like 'WHERE c_nationkey = n_nationkey' to filter.