When you run a complex query and want to quickly obtain query results, you can use the rds_duckdb extension of ApsaraDB RDS for PostgreSQL to implement analytical processing (AP) query acceleration. The extension exports local tables as column-oriented tables and uses vectorization capabilities to significantly accelerate complex queries without the need to modify original query statements. This allows you to obtain the query results in a convenient and efficient manner.
The rds_duckdb extension is available for users in the whitelist. If you want to use the extension, contact us.
Overview
The rds_duckdb extension of ApsaraDB RDS for PostgreSQL is developed based on DuckDB, which is efficient and resource-friendly. This extension enhances analytical query capabilities. You can use the extension to export the local tables of an ApsaraDB RDS for PostgreSQL instance as column-oriented tables and enable the AP query acceleration feature. This significantly accelerates complex queries to meet the requirements of analytical services.
Prerequisites
The RDS instance runs PostgreSQL 16.
The RDS instance runs a minor engine version of 20240830 or later.
The
shared_preload_libraries
parameter is set to'rds_duckdb'
. For more information about how to configure parameters, see Modify the parameters of an ApsaraDB RDS for PostgreSQL instance.
Usage notes
You cannot synchronize the data of column-oriented tables between the primary and secondary RDS instances.
You cannot export the data of column-oriented tables to implement automatic incremental synchronization.
Create and delete the extension
You must use a privileged account to create or delete the extension.
Create the extension
CREATE EXTENSION rds_duckdb;
Delete the extension
DROP EXTENSION rds_duckdb;
Manage a column-oriented table
Create a column-oriented table
Execute the following statement to export a local table of the RDS instance as a column-oriented table. The local table can be a user table, materialized view, or foreign table. The column-oriented table is used to accelerate analytical queries.
SELECT rds_duckdb.create_duckdb_table('Name of the local table');
Refresh a column-oriented table
Execute the following statement to refresh the exported column-store table based on the latest data in the local table of the RDS instance and update the table schema and data:
SELECT rds_duckdb.refresh_duckdb_table('Local table');
Delete a column-oriented table
SELECT rds_duckdb.drop_duckdb_table('Local table');
Manage the AP query acceleration feature
The rds_duckdb extension can be used to accelerate read-only queries. After you enable the AP query acceleration feature, if SQL statements need to be executed to query data from tables and DuckDB column-oriented tables exist for the tables, the SQL statements are executed in DuckDB to accelerate the queries. If the SQL statements are unsupported DML or DDL statements or column-oriented tables do not exist, the SQL statements are executed in ApsaraDB RDS for PostgreSQL.
The system displays a warning similar to WARNING: Trying to execute an operation with non-duckdb tables(test), fallback to PG
for the SQL statements that are executed in ApsaraDB RDS for PostgreSQL. Content in the parentheses () indicates the tables of the RDS instance for which DuckDB column-oriented tables do not exist.
The system also displays the WARNING: Modification operations on DuckDB tables are currently not supported, fallback to PG
warning for the SQL statements that are not used for read-only queries.
Enable the AP query acceleration feature
SET rds_duckdb.execution = on;
Configure the parameters for AP query acceleration
You can configure parameters in a session to manage the performance of the AP query acceleration. Sample commands:
SET rds_duckdb.worker_threads = 32;
SET rds_duckdb.memory_limit = 16384;
Parameter | Description | Suggestion |
rds_duckdb.worker_threads | The number of worker threads that are used for AP query acceleration. Valid values: 1 to 255. Default value: 1, which indicates that only one worker thread is used. |
|
rds_duckdb.memory_limit | The maximum memory that can be used for AP query acceleration. Unit: MB. You do not need to add a unit when you configure this parameter. Valid values: 1 to INT32_MAX. Default value: 100, which indicates that the upper limit is 100 MB. |
|
For more information about DuckDB parameters, see Configuration.
Disable the AP query acceleration feature
SET rds_duckdb.execution = off;
Examples
Test the performance of the rds_duckdb extension
In this example, a TPC-H test is performed on a Linux server to test the performance improvement of the rds_duckdb extension on complex queries.
Create an Elastic Compute Service (ECS) instance and generate test data. For more information, see Create a subscription ECS instance on the Quick Launch tab.
Download and install PostgreSQL from the PostgreSQL official website.
Download dbgen.
wget https://github.com/electrum/tpch-dbgen/archive/refs/heads/master.zip yum install -y unzip zip unzip master.zip cd tpch-dbgen-master/ echo "#define EOL_HANDLING 1" >> config.h # Delete the vertical bars (|) at the end of each row of data. make ./dbgen --help
Generate test data.
In this example, the storage path of the test data is
/data/test
, and the size of the test data is 100 GB. You can select the data storage path and the data size based on your business requirements../dbgen -s 100 mkdir /data/test/tpch_data mv *.tbl /data/test/tpch_data
NoteThe size of the data volume directly affects the query speed. Scale Factor (SF) is used in TPC-H to describe the amount of data. In TPC-H, 1 SF indicates 1 GB of data. By analogy, 100 SF indicates 100 GB of data. The data volume of 1 SF indicates only the total data volume of eight tables. Data related to elements such as indexes is not included. When you prepare data, you must reserve sufficient storage.
Create an RDS instance based on the description in Prerequisites and import the test data.
Create an RDS instance. For more information, see Create an ApsaraDB RDS for PostgreSQL instance.
Create a privileged account. For more information, see Create an account.
Connect to the RDS instance. For more information, see Connect to an ApsaraDB RDS for PostgreSQL instance.
Execute the following SQL statements to create eight tables for the TPC-H test:
CREATE TABLE customer(c_custkey BIGINT NOT NULL, c_name VARCHAR NOT NULL, c_address VARCHAR NOT NULL, c_nationkey INTEGER NOT NULL, c_phone VARCHAR NOT NULL, c_acctbal DECIMAL(15,2) NOT NULL, c_mktsegment VARCHAR NOT NULL, c_comment VARCHAR NOT NULL); CREATE TABLE lineitem(l_orderkey BIGINT NOT NULL, l_partkey BIGINT NOT NULL, l_suppkey BIGINT NOT NULL, l_linenumber BIGINT NOT NULL, l_quantity DECIMAL(15,2) NOT NULL, l_extendedprice DECIMAL(15,2) NOT NULL, l_discount DECIMAL(15,2) NOT NULL, l_tax DECIMAL(15,2) NOT NULL, l_returnflag VARCHAR NOT NULL, l_linestatus VARCHAR NOT NULL, l_shipdate DATE NOT NULL, l_commitdate DATE NOT NULL, l_receiptdate DATE NOT NULL, l_shipinstruct VARCHAR NOT NULL, l_shipmode VARCHAR NOT NULL, l_comment VARCHAR NOT NULL); CREATE TABLE nation(n_nationkey INTEGER NOT NULL, n_name VARCHAR NOT NULL, n_regionkey INTEGER NOT NULL, n_comment VARCHAR NOT NULL); CREATE TABLE orders(o_orderkey BIGINT NOT NULL, o_custkey BIGINT NOT NULL, o_orderstatus VARCHAR NOT NULL, o_totalprice DECIMAL(15,2) NOT NULL, o_orderdate DATE NOT NULL, o_orderpriority VARCHAR NOT NULL, o_clerk VARCHAR NOT NULL, o_shippriority INTEGER NOT NULL, o_comment VARCHAR NOT NULL); CREATE TABLE part(p_partkey BIGINT NOT NULL, p_name VARCHAR NOT NULL, p_mfgr VARCHAR NOT NULL, p_brand VARCHAR NOT NULL, p_type VARCHAR NOT NULL, p_size INTEGER NOT NULL, p_container VARCHAR NOT NULL, p_retailprice DECIMAL(15,2) NOT NULL, p_comment VARCHAR NOT NULL); CREATE TABLE partsupp(ps_partkey BIGINT NOT NULL, ps_suppkey BIGINT NOT NULL, ps_availqty BIGINT NOT NULL, ps_supplycost DECIMAL(15,2) NOT NULL, ps_comment VARCHAR NOT NULL); CREATE TABLE region(r_regionkey INTEGER NOT NULL, r_name VARCHAR NOT NULL, r_comment VARCHAR NOT NULL); CREATE TABLE supplier(s_suppkey BIGINT NOT NULL, s_name VARCHAR NOT NULL, s_address VARCHAR NOT NULL, s_nationkey INTEGER NOT NULL, s_phone VARCHAR NOT NULL, s_acctbal DECIMAL(15,2) NOT NULL, s_comment VARCHAR NOT NULL); \dt # View the imported data tables.
Import the generated test data.
COPY customer FROM '/data/test/tpch_data/customer.tbl' DELIMITER '|'; COPY lineitem FROM '/data/test/tpch_data/lineitem.tbl' DELIMITER '|'; COPY nation FROM '/data/test/tpch_data/nation.tbl' DELIMITER '|'; COPY orders FROM '/data/test/tpch_data/orders.tbl' DELIMITER '|'; COPY part FROM '/data/test/tpch_data/part.tbl' DELIMITER '|'; COPY partsupp FROM '/data/test/tpch_data/partsupp.tbl' DELIMITER '|'; COPY region FROM '/data/test/tpch_data/region.tbl' DELIMITER '|'; COPY supplier FROM '/data/test/tpch_data/supplier.tbl' DELIMITER '|'; \dt + # View the details of the imported data tables.
Install the rds_duckdb extension and generate column-oriented tables.
Install the rds_duckdb extension.
CREATE EXTENSION rds_duckdb;
Generate column-oriented tables.
# Create column-oriented tables for the local tables of the RDS instance. SELECT rds_duckdb.create_duckdb_table('customer'); SELECT rds_duckdb.create_duckdb_table('lineitem'); SELECT rds_duckdb.create_duckdb_table('nation'); SELECT rds_duckdb.create_duckdb_table('orders'); SELECT rds_duckdb.create_duckdb_table('part'); SELECT rds_duckdb.create_duckdb_table('partsupp'); SELECT rds_duckdb.create_duckdb_table('region'); SELECT rds_duckdb.create_duckdb_table('supplier');
Configure the parameters for AP query acceleration.
# Specify the number of threads and memory limit based on your test machine. Unit of the memory limit: MB. SET rds_duckdb.worker_threads = 32; SET rds_duckdb.memory_limit = 8192;
Enable the AP query acceleration feature.
SET rds_duckdb.execution = on; # Start timing. \timing on # Redirect the result to a file. \o /data/test/tpch_data/tpch_out
Execute the following 22 TPC-H SQL statements to test the query performance:
View SQL execution plans
You can execute the EXPLAIN
statement to view the execution plans of the SQL statements after the AP query acceleration feature is enabled and disabled.
The following sample code provides an example on the execution plan of the SQL statements after the AP query acceleration feature is enabled:
The following sample code provides an example on the execution plan of the SQL statements after the AP query acceleration feature is disabled: