Overview
TPC-H is a decision support benchmark consisting of a suite of business-oriented ad hoc queries and concurrent data modifications. The queries and data populating the database have been chosen to have broad industry-wide relevance. TPC-H illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.
This solution deploys and runs TPC-H on Alibaba Cloud with AnalyticDB for PostgreSQL.
Reference Architecture
Steps
Deploy Resources
Use this main.tf file in Terraform to provision ECS, EIP, and AnalyticDB for PostgreSQL instances from this solution.
Deploy and Run TPC-H
1. Mount data disk on ECS for TPC-H data set
1.1 Get the EIP of the ECS and then log on to ECS via SSH. Please use the account root/N1cetest; the password has been predefined in Terraform script for this tutorial.
ssh root@EIP_ECS
1.2 Initialize and mount the data disk.
fdisk -u /dev/vdb
1.3 Run the following command to finish the data disk mount.
fdisk -lu /dev/vdb
mkfs -t ext4 /dev/vdb1
cp /etc/fstab /etc/fstab.bak
echo `blkid /dev/vdb1 | awk '{print $2}' | sed 's/\"//g'` /mnt ext4 defaults 0 0 >> /etc/fstab
mount /dev/vdb1 /mnt
df -h
2. Generate the TPC-H 100GB data set and upload it to OSS
2.1 Install Git, clone this Github project file and generate the TPC-H data set.
Note: The project file will generate 100GB data set. If you want to generate data set in another size, please modify the file accordingly.
Note: The project file will generate 100GB data set. If you want to generate data set in another size, please modify the file accordingly.
yum install -y git
cd /mnt
git clone https://github.com/alibabacloud-howto/solution-adbpg-labs.git
sh /mnt/solution-adbpg-labs/benchmark-tpc-h/data_gen_100gb.sh
2.2 Run the command
df -h
to verify that the 100G TPC-H data set has been generated in /mnt
.
2.3 Create a bucket in OSS for TPC-H data set.
2.4 Upload TPC-H data files to OSS bucket for parallel loading to AnalyticDB PostgreSQL. Then update the parameters accordingly in the upload configuration file.
Note: The key parameters in the configuration file are listed below.
Note: The key parameters in the configuration file are listed below.
OSS_ENDPOINT
: The OSS endpoint of the bucket created for the TPC-H data setOSS_BUCKET
: The bucket created for the TPC-H data setAK_ID
: Alibaba Cloud account access keyAK_SECRET
: Alibaba Cloud account access secret
2.5 Run the following command to activate the configuration file. For each table, a folder will be created in the OSS bucket accordingly.
cd /mnt
wget http://gosspublic.alicdn.com/ossutil/1.7.3/ossutil64
chmod 755 ossutil64
sh /mnt/solution-adbpg-labs/benchmark-tpc-h/upload_tpch_oss.sh
3. Create TPC-H schema in AnalyticDB PostgreSQL and load data from OSS
3.1 Create the following user account in AnalyticDB PostgreSQL.
Name: adbpg
Password: N1cetest
Name: adbpg
Password: N1cetest
3.2 Download and setup AnalyticDB for PostgreSQL client.
cd /mnt
wget http://mirror.centos.org/centos/8/AppStream/x86_64/os/Packages/compat-openssl10-1.0.2o-3.el8.x86_64.rpm
rpm -i compat-openssl10-1.0.2o-3.el8.x86_64.rpm
wget http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/181125/cn_zh/1598426198114/adbpg_client_package.el7.x86_64.tar.gz
tar -xzvf adbpg_client_package.el7.x86_64.tar.gz
cd /mnt/adbpg_client_package/bin
vim ~/.pgpass
3.3 Input the following line in the
~/.pgpass
file. AnalyticDB PostgreSQL connection string
is the connection string of the AnalyticDB PostgreSQL cluster.AnalyticDB PostgreSQL connection string:5432:adbpg:adbpg:N1cetest
3.4 Run the commands to create TPC-H tables.
chmod 0600 ~/.pgpass
cd /mnt/adbpg_client_package/bin
./psql -h -Uadbpg adbpg -f /mnt/solution-adbpg-labs/benchmark-tpc-h/tpch-ddl.sql
3.5 Update the parameters accordingly in the load configuration file. Then, run the following command to load TPC-H data set from OSS into AnalyticDB PostgreSQL:
Note: The key parameters in the configuration file are listed below.
Note: The key parameters in the configuration file are listed below.
oss://adbpg-tpch-bechmark-hongkong
: It changes to your target TPC-H bucket accordingly.ACCESS KEY
: Alibaba Cloud account access keyACCESS SECRET
: Alibaba Cloud account access secretoss-cn-hongkong-internal.aliyuncs.com
: It changes to the endpoint of your target TPC-H bucket accordingly.cd /mnt/adbpg_client_package/bin
./psql -h -Uadbpg adbpg -f /mnt/solution-adbpg-labs/benchmark-tpc-h/load_tpch_oss_data.sql
3.6 After loading is completed, run
SELECT COUNT(*)
to verify the row count in all the tables, and run ANALYZE
to collect statistics of tables for further optimization.
4. Run TPC-H query benchmark
4.1 Update the parameters accordingly in the query configuration file. Then, run the following command:
Note: The key parameters in the configuration file are listed below.
Note: The key parameters in the configuration file are listed below.
ADB_PG_URL
: AnalyticDB PostgreSQL cluster connection stringADB_PG_USER
: AnalyticDB PostgreSQL cluster account user name (no need to change if you follow this guide to use the default user name adbpg
)sh /mnt/solution-adbpg-labs/benchmark-tpc-h/query.sh
4.2 You can view all the TPC-H queries in this directory. For detailed instruction on how to run the queries, refer to Running TPC-H Queries on AnalyticDB for PostgreSQL.
Quick Start
Quick Start—Terraform Template for Initializing Resources
Quick Start—Configuration File for Generating Data Set
Quick Start—Configuration File for Uploading Data Set
Quick Start—Configuration File for Loading Data Set to Database
Quick Start—Configuration File For TPC-H Queries
Tutorial for Beginners—Running TPC-H Queries on AnalyticDB for PostgreSQL
Reach Alibaba Cloud experts for support
Contact Us