By digoal
PolarDB's is a cloud-native database that features a computing-storage separation architecture. It offers affordable data storage, efficient scalability, high-speed parallel computing capabilities, and fast data search and processing. By combining PolarDB with computing algorithms, it enables the conversion of data into productivity and drives business data output.
This article discusses how the open-source version of PolarDB utilizes duckdb_fdw to support the parquet columnar storage file and efficient OLAP (Online Analytical Processing).
The test environment used in this article is macos + docker. For more information on deploying PolarDB, please see:
1. Deployment
A higher version of CMake is required.
https://cmake.org/download
wget https://github.com/Kitware/CMake/releases/download/v3.25.1/cmake-3.25.1.tar.gz
tar -zxvf cmake-3.25.1.tar.gz
cd cmake-3.25.1
./configure
make -j 4
make install
[postgres@67e1eed1b4b6 duckdb]$ export PATH=/usr/local/bin:$PATH
[postgres@67e1eed1b4b6 duckdb]$ which cmake
/usr/local/bin/cmake
git clone --depth 1 https://github.com/cwida/duckdb
cd duckdb
make -j 4
cd ~/duckdb
cp build/release/tools/sqlite3_api_wrapper/libsqlite3_api_wrapper.so ~/tmp_basedir_polardb_pg_1100_bld/lib/
cp tools/sqlite3_api_wrapper/include/sqlite3.h ~/tmp_basedir_polardb_pg_1100_bld/include/server/
cp build/release/src/libduckdb.so ~/tmp_basedir_polardb_pg_1100_bld/lib/
cd ~
git clone --depth 1 https://github.com/alitrack/duckdb_fdw
cd duckdb_fdw
USE_PGXS=1 make
USE_PGXS=1 make install
[postgres@67e1eed1b4b6 duckdb_fdw]$ USE_PGXS=1 make
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -c -o connection.o connection.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -c -o option.o option.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -c -o deparse.o deparse.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -c -o sqlite_query.o sqlite_query.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -c -o duckdb_fdw.o duckdb_fdw.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -shared -o duckdb_fdw.so connection.o option.o deparse.o sqlite_query.o duckdb_fdw.o -L/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib -Wl,-rpath,'$ORIGIN/../lib' -L/opt/rh/llvm-toolset-7.0/root/usr/lib64 -Wl,--as-needed -Wl,-rpath,'/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib',--enable-new-dtags -lsqlite3_api_wrapper
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -flto=thin -emit-llvm -c -o connection.bc connection.c
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -flto=thin -emit-llvm -c -o option.bc option.c
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -flto=thin -emit-llvm -c -o deparse.bc deparse.c
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -flto=thin -emit-llvm -c -o sqlite_query.bc sqlite_query.c
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -flto=thin -emit-llvm -c -o duckdb_fdw.bc duckdb_fdw.c
[postgres@67e1eed1b4b6 duckdb_fdw]$ USE_PGXS=1 make install
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib'
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/share/extension'
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/share/extension'
/usr/bin/install -c -m 755 duckdb_fdw.so '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/duckdb_fdw.so'
/usr/bin/install -c -m 644 .//duckdb_fdw.control '/home/postgres/tmp_basedir_polardb_pg_1100_bld/share/extension/'
/usr/bin/install -c -m 644 .//duckdb_fdw--1.0.sql .//duckdb_fdw--1.0--1.1.sql '/home/postgres/tmp_basedir_polardb_pg_1100_bld/share/extension/'
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode/duckdb_fdw'
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/
/usr/bin/install -c -m 644 connection.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./
/usr/bin/install -c -m 644 option.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./
/usr/bin/install -c -m 644 deparse.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./
/usr/bin/install -c -m 644 sqlite_query.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./
/usr/bin/install -c -m 644 duckdb_fdw.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./
cd '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode' && /opt/rh/llvm-toolset-7.0/root/usr/bin/llvm-lto -thinlto -thinlto-action=thinlink -o duckdb_fdw.index.bc duckdb_fdw/connection.bc duckdb_fdw/option.bc duckdb_fdw/deparse.bc duckdb_fdw/sqlite_query.bc duckdb_fdw/duckdb_fdw.bc
2. Load the plug-in
create extension duckdb_fdw;
3. Test
cd ~/duckdb/build/release
./duckdb /home/postgres/db
COPY (select generate_series as id, md5(random()::text) as info, now()::timestamp+(generate_series||' second')::interval as crt_time from generate_series(1,100)) TO '/home/postgres/t1.parquet' (FORMAT 'PARQUET');
COPY (select generate_series as cid, md5(random()::text) as info, now()::timestamp+(generate_series||' second')::interval as crt_time from generate_series(1,100)) TO '/home/postgres/t2.parquet' (FORMAT 'PARQUET');
COPY (select (floor(random()*100)+1)::int as gid, (floor(random()*100)+1)::int as cid, (random()*10)::int as c1, (random()*100)::int as c2, (random()*1000)::int as c3, (random()*10000)::int as c4, (random()*100000)::int as c5 from generate_series(1,1000000)) TO '/home/postgres/t3.parquet' (FORMAT 'PARQUET');
create view t1 as select * from read_parquet('/home/postgres/t1.parquet');
create view t2 as select * from read_parquet('/home/postgres/t2.parquet');
create view t3 as select * from read_parquet('/home/postgres/t3.parquet');
checkpoint;
4. Check the performance
psql
CREATE SERVER DuckDB_server FOREIGN DATA WRAPPER duckdb_fdw OPTIONS (database '/home/postgres/db');
IMPORT FOREIGN SCHEMA public FROM SERVER DuckDB_server INTO public;
postgres=# \det
List of foreign tables
Schema | Table | Server
--------+-------+---------------
public | t1 | duckdb_server
public | t2 | duckdb_server
public | t3 | duckdb_server
(3 rows)
postgres=# explain verbose select count(distinct gid) from t3;
QUERY PLAN
-------------------------------------------------------------
Foreign Scan (cost=1.00..1.00 rows=1 width=8)
Output: (count(DISTINCT gid))
SQLite query: SELECT count(DISTINCT "gid") FROM main."t3"
(3 rows)
postgres=# \timing
Timing is on.
select count(distinct gid) from t3;
select count(distinct gid),count(*) from t3;
explain verbose select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.gid;
select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.gid;
select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.gid join t2 on t3.cid=t2.cid;
explain verbose select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.gid join t2 on t3.cid=t2.cid;
5. Import the data to the local PolarDB database and do the same query to check the time.
PostgreSQL local computing
postgres=# create unlogged table lt1 as select * from t1;
SELECT 100
postgres=# create unlogged table lt2 as select * from t2;
SELECT 100
postgres=# create unlogged table lt3 as select * from t3;
SELECT 10000000
postgres=# \timing
Timing is on.
postgres=# select count(distinct gid) from lt3;
count
-------
100
(1 row)
Time: 14891.356 ms (00:14.891)
Open Source PolarDB Uses pg_trgm GIN Indexes to Implement Efficient Fuzzy Queries
Alibaba Cloud New Products - January 19, 2021
ApsaraDB - April 19, 2019
降云 - January 12, 2021
ApsaraDB - June 7, 2022
Apache Flink Community China - June 8, 2021
Alibaba EMR - May 20, 2022
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAlibaba Cloud PolarDB for Xscale (PolarDB-X) is a cloud-native high-performance distributed database service independently developed by Alibaba Cloud.
Learn MoreAlibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreMore Posts by digoal