本文介绍高性能版Spark全密态计算引擎与AnalyticDB for MySQL Spark的性能测试流程及对比结果,通过对比可知,在处理大规模数据集时,高性能版Spark全密态计算引擎相较于AnalyticDB for MySQL Spark 3.2.0版本的性能提升了0.9倍,具体性能对比详情请参见本文。
本测试并未完全依照TPC-H基准测试规范,而是基于该测试规范进行了修改。本测试结果不能等同于完全遵守TPC-H测试规范所获得的测试结果,因此不能与完全遵守该测试规范获得的测试结果进行对比。
集群的产品系列为湖仓版。
集群与OSS存储空间位于相同地域。
已创建Job型资源组。具体操作,请参见新建资源组。
已创建数据库账号。
如果是通过阿里云账号访问,只需创建高权限账号。具体操作,请参见创建高权限账号。
如果是通过RAM用户访问,需要创建高权限账号和普通账号并且将RAM用户绑定到普通账号上。具体操作,请参见创建数据库账号和绑定或解绑RAM用户与数据库账号。
已为RAM用户授予AliyunADBFullAccess、AliyunADBSparkProcessingDataRole和AnalyticDB MySQL库表的读写权限。具体操作,请参见账号授权。
配置项 | 说明 | 备注 |
配置项 | 说明 | 备注 |
地域及可用区 | 华东1(杭州)可用区K | 不涉及。 |
网络类型 | 专有网络VPC | 不涉及。 |
产品系列 | 湖仓版集群 | 不涉及。 |
Spark资源规格 | 2xlarge(16 core 64 GB,8台) | 不涉及。 |
执行引擎 |
| 对比AnalyticDB for MySQL Spark和高性能版Spark全密态计算引擎的性能。 |
测试数据量 | 明文数据:1TB 密文数据:1TB | 明文数据与密文数据的文件类型均为Parquet。 |
存储 | OSS存储空间 | 不涉及。 |
本文示例以TPC-H数据集为例。
在TPC官网下载TPC-H标准的数据生成工具DBGEN。
构造测试数据。更多信息,请参见构建数据。
./dbgen -s $scale -C $chunks -S $i -f
TPC-H测试数据集包含customer
、lineitem
、nation
、orders
、part
、partsupp
、region
、supplier
八张表。
将上述步骤生成的八张表转换为Parquet格式。本文以customer
为例,示例如下:
val df = spark.read
.format("csv")
.option("header", "false")
.option("delimiter", "|")
.schema(StructType {
StructField("c_custkey", IntegerType, true) ::
StructField("c_name", StringType, true) ::
StructField("c_address", StringType, true) ::
StructField("c_nationkey", IntegerType, true) ::
StructField("c_phone", StringType, true) ::
StructField("c_acctbal", FloatType, true) ::
StructField("c_mktsegment", StringType, true) ::
StructField("c_comment", StringType, true) ::
Nil
})
.load("customer.tbl")
df.write.parquet("parquet/customer")
本文仅展示将customer
表转换为Parquet格式的示例语句,其他表与本示例相似。
将TPC-H测试数据集上传至OSS存储空间。本文示例为:oss://testBucketName/adb/Spark
。
登录云原生数据仓库AnalyticDB MySQL控制台,在左上角选择集群所在地域。在左侧导航栏,单击集群列表,在湖仓版页签下,单击目标集群ID。
在左侧导航栏,单击
。在SQLConsole窗口,选择Spark引擎和Job型资源组。
创建数据库。
SET spark.driver.resourceSpec=2xlarge;
SET spark.executor.instances=8;
SET spark.executor.resourceSpec=2xlarge;
CREATE DATABASE adb_plain_db;
创建外表,用于存储明文数据。LOCATION为TPC-H测试数据集所在的OSS路径。本文以customer表为例,OSS路径为oss://testBucketName/adb/Spark/customer
。
SET spark.driver.resourceSpec=2xlarge;
SET spark.executor.instances=8;
SET spark.executor.resourceSpec=2xlarge;
USE adb_plain_db;
CREATE TABLE IF NOT EXISTS customer
(
c_custkey long,
c_name string,
c_address string,
c_nationkey long,
c_phone string,
c_acctbal decimal(12, 2),
c_mktsegment string,
c_comment string
)
USING parquet
LOCATION 'oss://testBucketName/adb/Spark/customer';
执行SQL计算,并记录SQL执行耗时。
SET spark.driver.resourceSpec=2xlarge;
SET spark.executor.instances=8;
SET spark.executor.resourceSpec=2xlarge;
SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) AS sum_qty,
sum(l_extendedprice) AS sum_base_price,
sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
avg(l_quantity) AS avg_qty,
avg(l_extendedprice) AS avg_price,
avg(l_discount) AS avg_disc,
count(*) AS count_order
FROM
lineitem
WHERE
l_shipdate <= date'1998-09-02' - interval 1 day
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus;
SELECT
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
FROM
part,
supplier,
partsupp,
nation,
region
WHERE
p_partkey = ps_partkey
AND s_suppkey = ps_suppkey
AND p_size = 15
AND p_type LIKE '%BRASS'
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'EUROPE'
AND ps_supplycost = (
SELECT
min(ps_supplycost)
FROM
partsupp,
supplier,
nation,
region
WHERE
p_partkey = ps_partkey
AND s_suppkey = ps_suppkey
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'EUROPE')
ORDER BY
s_acctbal DESC,
n_name,
s_name,
p_partkey
LIMIT 100;
SELECT
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) AS revenue,
o_orderdate,
o_shippriority
FROM
customer,
orders,
lineitem
WHERE
c_mktsegment = 'BUILDING'
AND c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate < date'1995-03-15'
AND l_shipdate > date'1995-03-15'
GROUP BY
l_orderkey,
o_orderdate,
o_shippriority
ORDER BY
revenue DESC,
o_orderdate
LIMIT 10;
SELECT
o_orderpriority,
count(*) AS order_count
FROM
orders
WHERE
o_orderdate >= date'1993-07-01'
AND o_orderdate < date'1993-07-01' + interval 3 month
AND EXISTS (
SELECT
*
FROM
lineitem
WHERE
l_orderkey = o_orderkey
AND l_commitdate < l_receiptdate)
GROUP BY
o_orderpriority
ORDER BY
o_orderpriority;
SELECT
n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
FROM
customer,
orders,
lineitem,
supplier,
nation,
region
WHERE
c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND l_suppkey = s_suppkey
AND c_nationkey = s_nationkey
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'ASIA'
AND o_orderdate >= date'1994-01-01'
AND o_orderdate < date'1994-01-01' + interval 1 year
GROUP BY
n_name
ORDER BY
revenue DESC;
SELECT
sum(l_extendedprice * l_discount) AS revenue
FROM
lineitem
WHERE
l_shipdate >= date'1994-01-01'
AND l_shipdate < date'1994-01-01' + interval 1 year
AND l_discount BETWEEN 0.06 - 0.01 AND 0.06 + 0.01
AND l_quantity < 24;
SELECT
supp_nation,
cust_nation,
l_year,
sum(volume) AS revenue
FROM (
SELECT
n1.n_name AS supp_nation,
n2.n_name AS cust_nation,
extract(year FROM l_shipdate) AS l_year,
l_extendedprice * (1 - l_discount) AS volume
FROM
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
WHERE
s_suppkey = l_suppkey
AND o_orderkey = l_orderkey
AND c_custkey = o_custkey
AND s_nationkey = n1.n_nationkey
AND c_nationkey = n2.n_nationkey
AND ((n1.n_name = 'FRANCE'
AND n2.n_name = 'GERMANY')
OR (n1.n_name = 'GERMANY'
AND n2.n_name = 'FRANCE'))
AND l_shipdate BETWEEN date'1995-01-01' AND date'1996-12-31') AS shipping
GROUP BY
supp_nation,
cust_nation,
l_year
ORDER BY
supp_nation,
cust_nation,
l_year;
SELECT
o_year,
sum(
CASE WHEN nation = 'BRAZIL' THEN
volume
ELSE
0
END) / sum(volume) AS mkt_share
FROM (
SELECT
extract(year FROM o_orderdate) AS o_year,
l_extendedprice * (1 - l_discount) AS volume,
n2.n_name AS nation
FROM
part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
WHERE
p_partkey = l_partkey
AND s_suppkey = l_suppkey
AND l_orderkey = o_orderkey
AND o_custkey = c_custkey
AND c_nationkey = n1.n_nationkey
AND n1.n_regionkey = r_regionkey
AND r_name = 'AMERICA'
AND s_nationkey = n2.n_nationkey
AND o_orderdate BETWEEN date'1995-01-01' AND date'1996-12-31'
AND p_type = 'ECONOMY ANODIZED STEEL') AS all_nations
GROUP BY
o_year
ORDER BY
o_year;
SELECT
nation,
o_year,
sum(amount) AS sum_profit
FROM (
SELECT
n_name AS nation,
extract(year FROM o_orderdate) AS o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
FROM
part,
supplier,
lineitem,
partsupp,
orders,
nation
WHERE
s_suppkey = l_suppkey
AND ps_suppkey = l_suppkey
AND ps_partkey = l_partkey
AND p_partkey = l_partkey
AND o_orderkey = l_orderkey
AND s_nationkey = n_nationkey
AND p_name LIKE '%green%') AS profit
GROUP BY
nation,
o_year
ORDER BY
nation,
o_year DESC;
SELECT
c_custkey,
c_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
FROM
customer,
orders,
lineitem,
nation
WHERE
c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate >= date'1993-10-01' AND o_orderdate < date'1993-10-01' + interval 3 month
AND l_returnflag = 'R'
AND c_nationkey = n_nationkey
GROUP BY
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
ORDER BY
revenue DESC
LIMIT 20;
SELECT
ps_partkey,
sum(ps_supplycost * ps_availqty) AS value
FROM
partsupp,
supplier,
nation
WHERE
ps_suppkey = s_suppkey
AND s_nationkey = n_nationkey
AND n_name = 'GERMANY'
GROUP BY
ps_partkey
HAVING
sum(ps_supplycost * ps_availqty) > (
SELECT
sum(ps_supplycost * ps_availqty) * 0.0001000000
FROM
partsupp,
supplier,
nation
WHERE
ps_suppkey = s_suppkey
AND s_nationkey = n_nationkey
AND n_name = 'GERMANY')
ORDER BY
value DESC;
SELECT
l_shipmode,
sum(
CASE WHEN o_orderpriority = '1-URGENT'
OR o_orderpriority = '2-HIGH' THEN
1
ELSE
0
END) AS high_line_count,
sum(
CASE WHEN o_orderpriority <> '1-URGENT'
AND o_orderpriority <> '2-HIGH' THEN
1
ELSE
0
END) AS low_line_count
FROM
orders,
lineitem
WHERE
o_orderkey = l_orderkey
AND l_shipmode IN ('MAIL', 'SHIP')
AND l_commitdate < l_receiptdate
AND l_shipdate < l_commitdate
AND l_receiptdate >= date'1994-01-01' AND l_receiptdate < date'1994-01-01' + interval 1 year
GROUP BY
l_shipmode
ORDER BY
l_shipmode;
SELECT
c_count,
count(*) AS custdist
FROM (
SELECT
c_custkey,
count(o_orderkey)
FROM
customer
LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%special%requests%'
GROUP BY
c_custkey) AS c_orders (c_custkey,
c_count)
GROUP BY
c_count
ORDER BY
custdist DESC,
c_count DESC;
SELECT
100.00 * sum(
CASE WHEN p_type LIKE 'PROMO%' THEN
l_extendedprice * (1 - l_discount)
ELSE
0
END) / sum(l_extendedprice * (1 - l_discount)) AS promo_revenue
FROM
lineitem,
part
WHERE
l_partkey = p_partkey
AND l_shipdate >= date'1995-09-01'
AND l_shipdate < date'1995-09-01' + interval 1 month;
SELECT
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
FROM
supplier,
(
SELECT
l_suppkey AS supplier_no,
sum(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
GROUP BY
supplier_no) revenue0
WHERE
s_suppkey = supplier_no
AND total_revenue = (
SELECT
max(total_revenue)
FROM (
SELECT
l_suppkey AS supplier_no,
sum(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
GROUP BY
supplier_no) revenue1)
ORDER BY
s_suppkey;
SELECT
p_brand,
p_type,
p_size,
count(DISTINCT ps_suppkey) AS supplier_cnt
FROM
partsupp,
part
WHERE
p_partkey = ps_partkey
AND p_brand <> 'Brand#45'
AND p_type NOT LIKE 'MEDIUM POLISHED%'
AND p_size IN (49, 14, 23, 45, 19, 3, 36, 9)
AND ps_suppkey NOT IN (
SELECT
s_suppkey
FROM
supplier
WHERE
s_comment LIKE '%Customer%Complaints%')
GROUP BY
p_brand,
p_type,
p_size
ORDER BY
supplier_cnt DESC,
p_brand,
p_type,
p_size;
SELECT
sum(l_extendedprice) / 7.0 AS avg_yearly
FROM
lineitem,
part
WHERE
p_partkey = l_partkey
AND p_brand = 'Brand#23'
AND p_container = 'MED BOX'
AND l_quantity < (
SELECT
0.2 * avg(l_quantity)
FROM
lineitem
WHERE
l_partkey = p_partkey);
SELECT
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
FROM
customer,
orders,
lineitem
WHERE
o_orderkey IN (
SELECT
l_orderkey
FROM
lineitem
GROUP BY
l_orderkey
HAVING
sum(l_quantity) > 300)
AND c_custkey = o_custkey
AND o_orderkey = l_orderkey
GROUP BY
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice
ORDER BY
o_totalprice DESC,
o_orderdate
LIMIT 100;
SELECT
sum(l_extendedprice * (1 - l_discount)) AS revenue
FROM
lineitem,
part
WHERE (p_partkey = l_partkey
AND p_brand = 'Brand#12'
AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
AND l_quantity >= 1
AND l_quantity <= 1 + 10
AND p_size BETWEEN 1 AND 5
AND l_shipmode IN ('AIR', 'AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON')
OR (p_partkey = l_partkey
AND p_brand = 'Brand#23'
AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
AND l_quantity >= 10
AND l_quantity <= 10 + 10
AND p_size BETWEEN 1 AND 10
AND l_shipmode IN ('AIR', 'AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON')
OR (p_partkey = l_partkey
AND p_brand = 'Brand#34'
AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
AND l_quantity >= 20
AND l_quantity <= 20 + 10
AND p_size BETWEEN 1 AND 15
AND l_shipmode IN ('AIR', 'AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON');
SELECT
s_name,
s_address
FROM
supplier,
nation
WHERE
s_suppkey IN (
SELECT
ps_suppkey
FROM
partsupp
WHERE
ps_partkey IN (
SELECT
p_partkey
FROM
part
WHERE
p_name LIKE 'forest%')
AND ps_availqty > (
SELECT
0.5 * sum(l_quantity)
FROM
lineitem
WHERE
l_partkey = ps_partkey
AND l_suppkey = ps_suppkey
AND l_shipdate >= date'1994-01-01'
AND l_shipdate < date'1994-01-01' + interval 1 year))
AND s_nationkey = n_nationkey
AND n_name = 'CANADA'
ORDER BY
s_name;
SELECT
s_name,
count(*) AS numwait
FROM
supplier,
lineitem l1,
orders,
nation
WHERE
s_suppkey = l1.l_suppkey
AND o_orderkey = l1.l_orderkey
AND o_orderstatus = 'F'
AND l1.l_receiptdate > l1.l_commitdate
AND EXISTS (
SELECT
*
FROM
lineitem l2
WHERE
l2.l_orderkey = l1.l_orderkey
AND l2.l_suppkey <> l1.l_suppkey)
AND NOT EXISTS (
SELECT
*
FROM
lineitem l3
WHERE
l3.l_orderkey = l1.l_orderkey
AND l3.l_suppkey <> l1.l_suppkey
AND l3.l_receiptdate > l3.l_commitdate)
AND s_nationkey = n_nationkey
AND n_name = 'SAUDI ARABIA'
GROUP BY
s_name
ORDER BY
numwait DESC,
s_name
LIMIT 100;
SELECT
cntrycode,
count(*) AS numcust,
sum(c_acctbal) AS totacctbal
FROM (
SELECT
substring(c_phone FROM 1 FOR 2) AS cntrycode,
c_acctbal
FROM
customer
WHERE
substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17')
AND c_acctbal > (
SELECT
avg(c_acctbal)
FROM
customer
WHERE
c_acctbal > 0.00
AND substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17'))
AND NOT EXISTS (
SELECT
*
FROM
orders
WHERE
o_custkey = c_custkey)) AS custsale
GROUP BY
cntrycode
ORDER BY
cntrycode;
登录云原生数据仓库AnalyticDB MySQL控制台,在左上角选择集群所在地域。在左侧导航栏,单击集群列表,在湖仓版页签下,单击目标集群ID。
在左侧导航栏,单击
。在SQLConsole窗口,选择Spark引擎和Job型资源组。
开启密态计算,并创建数据库。
-- 开启密态计算
SET spark.adb.native.enabled=true;
-- 配置Spark资源规格
SET spark.driver.resourceSpec=2xlarge;
SET spark.executor.instances=8;
SET spark.executor.resourceSpec=2xlarge;
-- 创建数据库
CREATE DATABASE adb_external_db;
创建OSS外表,用于存储明文数据。LOCATION为TPC-H测试数据集所在的OSS路径。本文以customer表为例,OSS路径为oss://testBucketName/adb/Spark/customer
。
SET spark.adb.native.enabled=true;
SET spark.driver.resourceSpec=2xlarge;
SET spark.executor.instances=8;
SET spark.executor.resourceSpec=2xlarge;
USE adb_external_db;
CREATE TABLE IF NOT EXISTS customer
(
c_custkey long,
c_name string,
c_address string,
c_nationkey long,
c_phone string,
c_acctbal decimal(12, 2),
c_mktsegment string,
c_comment string
)
USING parquet
LOCATION 'oss://testBucketName/adb/Spark/customer';
执行SQL计算,并记录SQL执行耗时。
SET spark.driver.resourceSpec=2xlarge;
SET spark.executor.instances=8;
SET spark.executor.resourceSpec=2xlarge;
SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) AS sum_qty,
sum(l_extendedprice) AS sum_base_price,
sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
avg(l_quantity) AS avg_qty,
avg(l_extendedprice) AS avg_price,
avg(l_discount) AS avg_disc,
count(*) AS count_order
FROM
lineitem
WHERE
l_shipdate <= date'1998-09-02' - interval 1 day
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus;
SELECT
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
FROM
part,
supplier,
partsupp,
nation,
region
WHERE
p_partkey = ps_partkey
AND s_suppkey = ps_suppkey
AND p_size = 15
AND p_type LIKE '%BRASS'
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'EUROPE'
AND ps_supplycost = (
SELECT
min(ps_supplycost)
FROM
partsupp,
supplier,
nation,
region
WHERE
p_partkey = ps_partkey
AND s_suppkey = ps_suppkey
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'EUROPE')
ORDER BY
s_acctbal DESC,
n_name,
s_name,
p_partkey
LIMIT 100;
SELECT
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) AS revenue,
o_orderdate,
o_shippriority
FROM
customer,
orders,
lineitem
WHERE
c_mktsegment = 'BUILDING'
AND c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate < date'1995-03-15'
AND l_shipdate > date'1995-03-15'
GROUP BY
l_orderkey,
o_orderdate,
o_shippriority
ORDER BY
revenue DESC,
o_orderdate
LIMIT 10;
SELECT
o_orderpriority,
count(*) AS order_count
FROM
orders
WHERE
o_orderdate >= date'1993-07-01'
AND o_orderdate < date'1993-07-01' + interval 3 month
AND EXISTS (
SELECT
*
FROM
lineitem
WHERE
l_orderkey = o_orderkey
AND l_commitdate < l_receiptdate)
GROUP BY
o_orderpriority
ORDER BY
o_orderpriority;
SELECT
n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
FROM
customer,
orders,
lineitem,
supplier,
nation,
region
WHERE
c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND l_suppkey = s_suppkey
AND c_nationkey = s_nationkey
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'ASIA'
AND o_orderdate >= date'1994-01-01'
AND o_orderdate < date'1994-01-01' + interval 1 year
GROUP BY
n_name
ORDER BY
revenue DESC;
SELECT
sum(l_extendedprice * l_discount) AS revenue
FROM
lineitem
WHERE
l_shipdate >= date'1994-01-01'
AND l_shipdate < date'1994-01-01' + interval 1 year
AND l_discount BETWEEN 0.06 - 0.01 AND 0.06 + 0.01
AND l_quantity < 24;
SELECT
supp_nation,
cust_nation,
l_year,
sum(volume) AS revenue
FROM (
SELECT
n1.n_name AS supp_nation,
n2.n_name AS cust_nation,
extract(year FROM l_shipdate) AS l_year,
l_extendedprice * (1 - l_discount) AS volume
FROM
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
WHERE
s_suppkey = l_suppkey
AND o_orderkey = l_orderkey
AND c_custkey = o_custkey
AND s_nationkey = n1.n_nationkey
AND c_nationkey = n2.n_nationkey
AND ((n1.n_name = 'FRANCE'
AND n2.n_name = 'GERMANY')
OR (n1.n_name = 'GERMANY'
AND n2.n_name = 'FRANCE'))
AND l_shipdate BETWEEN date'1995-01-01' AND date'1996-12-31') AS shipping
GROUP BY
supp_nation,
cust_nation,
l_year
ORDER BY
supp_nation,
cust_nation,
l_year;
SELECT
o_year,
sum(
CASE WHEN nation = 'BRAZIL' THEN
volume
ELSE
0
END) / sum(volume) AS mkt_share
FROM (
SELECT
extract(year FROM o_orderdate) AS o_year,
l_extendedprice * (1 - l_discount) AS volume,
n2.n_name AS nation
FROM
part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
WHERE
p_partkey = l_partkey
AND s_suppkey = l_suppkey
AND l_orderkey = o_orderkey
AND o_custkey = c_custkey
AND c_nationkey = n1.n_nationkey
AND n1.n_regionkey = r_regionkey
AND r_name = 'AMERICA'
AND s_nationkey = n2.n_nationkey
AND o_orderdate BETWEEN date'1995-01-01' AND date'1996-12-31'
AND p_type = 'ECONOMY ANODIZED STEEL') AS all_nations
GROUP BY
o_year
ORDER BY
o_year;
SELECT
nation,
o_year,
sum(amount) AS sum_profit
FROM (
SELECT
n_name AS nation,
extract(year FROM o_orderdate) AS o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
FROM
part,
supplier,
lineitem,
partsupp,
orders,
nation
WHERE
s_suppkey = l_suppkey
AND ps_suppkey = l_suppkey
AND ps_partkey = l_partkey
AND p_partkey = l_partkey
AND o_orderkey = l_orderkey
AND s_nationkey = n_nationkey
AND p_name LIKE '%green%') AS profit
GROUP BY
nation,
o_year
ORDER BY
nation,
o_year DESC;
SELECT
c_custkey,
c_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
FROM
customer,
orders,
lineitem,
nation
WHERE
c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate >= date'1993-10-01' AND o_orderdate < date'1993-10-01' + interval 3 month
AND l_returnflag = 'R'
AND c_nationkey = n_nationkey
GROUP BY
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
ORDER BY
revenue DESC
LIMIT 20;
SELECT
ps_partkey,
sum(ps_supplycost * ps_availqty) AS value
FROM
partsupp,
supplier,
nation
WHERE
ps_suppkey = s_suppkey
AND s_nationkey = n_nationkey
AND n_name = 'GERMANY'
GROUP BY
ps_partkey
HAVING
sum(ps_supplycost * ps_availqty) > (
SELECT
sum(ps_supplycost * ps_availqty) * 0.0001000000
FROM
partsupp,
supplier,
nation
WHERE
ps_suppkey = s_suppkey
AND s_nationkey = n_nationkey
AND n_name = 'GERMANY')
ORDER BY
value DESC;
SELECT
l_shipmode,
sum(
CASE WHEN o_orderpriority = '1-URGENT'
OR o_orderpriority = '2-HIGH' THEN
1
ELSE
0
END) AS high_line_count,
sum(
CASE WHEN o_orderpriority <> '1-URGENT'
AND o_orderpriority <> '2-HIGH' THEN
1
ELSE
0
END) AS low_line_count
FROM
orders,
lineitem
WHERE
o_orderkey = l_orderkey
AND l_shipmode IN ('MAIL', 'SHIP')
AND l_commitdate < l_receiptdate
AND l_shipdate < l_commitdate
AND l_receiptdate >= date'1994-01-01' AND l_receiptdate < date'1994-01-01' + interval 1 year
GROUP BY
l_shipmode
ORDER BY
l_shipmode;
SELECT
c_count,
count(*) AS custdist
FROM (
SELECT
c_custkey,
count(o_orderkey)
FROM
customer
LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%special%requests%'
GROUP BY
c_custkey) AS c_orders (c_custkey,
c_count)
GROUP BY
c_count
ORDER BY
custdist DESC,
c_count DESC;
SELECT
100.00 * sum(
CASE WHEN p_type LIKE 'PROMO%' THEN
l_extendedprice * (1 - l_discount)
ELSE
0
END) / sum(l_extendedprice * (1 - l_discount)) AS promo_revenue
FROM
lineitem,
part
WHERE
l_partkey = p_partkey
AND l_shipdate >= date'1995-09-01'
AND l_shipdate < date'1995-09-01' + interval 1 month;
SELECT
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
FROM
supplier,
(
SELECT
l_suppkey AS supplier_no,
sum(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
GROUP BY
supplier_no) revenue0
WHERE
s_suppkey = supplier_no
AND total_revenue = (
SELECT
max(total_revenue)
FROM (
SELECT
l_suppkey AS supplier_no,
sum(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
GROUP BY
supplier_no) revenue1)
ORDER BY
s_suppkey;
SELECT
p_brand,
p_type,
p_size,
count(DISTINCT ps_suppkey) AS supplier_cnt
FROM
partsupp,
part
WHERE
p_partkey = ps_partkey
AND p_brand <> 'Brand#45'
AND p_type NOT LIKE 'MEDIUM POLISHED%'
AND p_size IN (49, 14, 23, 45, 19, 3, 36, 9)
AND ps_suppkey NOT IN (
SELECT
s_suppkey
FROM
supplier
WHERE
s_comment LIKE '%Customer%Complaints%')
GROUP BY
p_brand,
p_type,
p_size
ORDER BY
supplier_cnt DESC,
p_brand,
p_type,
p_size;
SELECT
sum(l_extendedprice) / 7.0 AS avg_yearly
FROM
lineitem,
part
WHERE
p_partkey = l_partkey
AND p_brand = 'Brand#23'
AND p_container = 'MED BOX'
AND l_quantity < (
SELECT
0.2 * avg(l_quantity)
FROM
lineitem
WHERE
l_partkey = p_partkey);
SELECT
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
FROM
customer,
orders,
lineitem
WHERE
o_orderkey IN (
SELECT
l_orderkey
FROM
lineitem
GROUP BY
l_orderkey
HAVING
sum(l_quantity) > 300)
AND c_custkey = o_custkey
AND o_orderkey = l_orderkey
GROUP BY
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice
ORDER BY
o_totalprice DESC,
o_orderdate
LIMIT 100;
SELECT
sum(l_extendedprice * (1 - l_discount)) AS revenue
FROM
lineitem,
part
WHERE (p_partkey = l_partkey
AND p_brand = 'Brand#12'
AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
AND l_quantity >= 1
AND l_quantity <= 1 + 10
AND p_size BETWEEN 1 AND 5
AND l_shipmode IN ('AIR', 'AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON')
OR (p_partkey = l_partkey
AND p_brand = 'Brand#23'
AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
AND l_quantity >= 10
AND l_quantity <= 10 + 10
AND p_size BETWEEN 1 AND 10
AND l_shipmode IN ('AIR', 'AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON')
OR (p_partkey = l_partkey
AND p_brand = 'Brand#34'
AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
AND l_quantity >= 20
AND l_quantity <= 20 + 10
AND p_size BETWEEN 1 AND 15
AND l_shipmode IN ('AIR', 'AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON');
SELECT
s_name,
s_address
FROM
supplier,
nation
WHERE
s_suppkey IN (
SELECT
ps_suppkey
FROM
partsupp
WHERE
ps_partkey IN (
SELECT
p_partkey
FROM
part
WHERE
p_name LIKE 'forest%')
AND ps_availqty > (
SELECT
0.5 * sum(l_quantity)
FROM
lineitem
WHERE
l_partkey = ps_partkey
AND l_suppkey = ps_suppkey
AND l_shipdate >= date'1994-01-01'
AND l_shipdate < date'1994-01-01' + interval 1 year))
AND s_nationkey = n_nationkey
AND n_name = 'CANADA'
ORDER BY
s_name;
SELECT
s_name,
count(*) AS numwait
FROM
supplier,
lineitem l1,
orders,
nation
WHERE
s_suppkey = l1.l_suppkey
AND o_orderkey = l1.l_orderkey
AND o_orderstatus = 'F'
AND l1.l_receiptdate > l1.l_commitdate
AND EXISTS (
SELECT
*
FROM
lineitem l2
WHERE
l2.l_orderkey = l1.l_orderkey
AND l2.l_suppkey <> l1.l_suppkey)
AND NOT EXISTS (
SELECT
*
FROM
lineitem l3
WHERE
l3.l_orderkey = l1.l_orderkey
AND l3.l_suppkey <> l1.l_suppkey
AND l3.l_receiptdate > l3.l_commitdate)
AND s_nationkey = n_nationkey
AND n_name = 'SAUDI ARABIA'
GROUP BY
s_name
ORDER BY
numwait DESC,
s_name
LIMIT 100;
SELECT
cntrycode,
count(*) AS numcust,
sum(c_acctbal) AS totacctbal
FROM (
SELECT
substring(c_phone FROM 1 FOR 2) AS cntrycode,
c_acctbal
FROM
customer
WHERE
substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17')
AND c_acctbal > (
SELECT
avg(c_acctbal)
FROM
customer
WHERE
c_acctbal > 0.00
AND substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17'))
AND NOT EXISTS (
SELECT
*
FROM
orders
WHERE
o_custkey = c_custkey)) AS custsale
GROUP BY
cntrycode
ORDER BY
cntrycode;
登录云原生数据仓库AnalyticDB MySQL控制台,在左上角选择集群所在地域。在左侧导航栏,单击集群列表,在湖仓版页签下,单击目标集群ID。
在左侧导航栏,单击
。在SQLConsole窗口,选择Spark引擎和Job型资源组。
开启密态计算,设置用户的主密钥,并创建数据库。
-- 开启密态计算
SET spark.adb.native.enabled=true;
-- 开启密文读写支持并设置主密钥列表,KMS Client 以及 CryptoFactory(开启后引擎可同时支持明文和密文)
SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
-- 配置Spark资源规格
SET spark.driver.resourceSpec=2xlarge;
SET spark.executor.instances=8;
SET spark.executor.resourceSpec=2xlarge;
--创建数据库
CREATE database IF NOT EXISTS adb_encryption_db;
创建OSS外表,用于存储明文数据。LOCATION为TPC-H测试数据集所在的OSS路径。本文以customer_tmp
表为例,OSS路径为oss://testBucketName/adb/Spark/customer
。
SET spark.adb.native.enabled=true;
SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
SET spark.driver.resourceSpec=2xlarge;
SET spark.executor.instances=8;
SET spark.executor.resourceSpec=2xlarge;
USE adb_encryption_db;
CREATE TABLE IF NOT EXISTS customer_tmp
(
c_custkey long,
c_name string,
c_address string,
c_nationkey long,
c_phone string,
c_acctbal decimal(12, 2),
c_mktsegment string,
c_comment string
)
USING parquet
LOCATION 'oss://testBucketName/adb/Spark/customer';
创建OSS外表,用于存储密文数据。本文示例将customer
外表的数据指定存储在oss://testBucketName/adb/Spark/enc_customer
。
SET spark.adb.native.enabled=true;
SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
SET spark.driver.resourceSpec=2xlarge;
SET spark.executor.instances=8;
SET spark.executor.resourceSpec=2xlarge;
USE adb_encryption_db;
CREATE TABLE IF NOT EXISTS customer
USING Parquet
OPTIONS (
'parquet.encryption.column.keys'='kc1:c_name;kc2:c_phone',
'parquet.encryption.footer.key'='kf'
)
LOCATION 'oss://testBucketName/adb/Spark/enc_customer'
AS
SELECT *
FROM customer_tmp;
执行SQL计算,并记录SQL执行耗时。
SET spark.driver.resourceSpec=2xlarge;
SET spark.executor.instances=8;
SET spark.executor.resourceSpec=2xlarge;
SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) AS sum_qty,
sum(l_extendedprice) AS sum_base_price,
sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
avg(l_quantity) AS avg_qty,
avg(l_extendedprice) AS avg_price,
avg(l_discount) AS avg_disc,
count(*) AS count_order
FROM
lineitem
WHERE
l_shipdate <= date'1998-09-02' - interval 1 day
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus;
SELECT
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
FROM
part,
supplier,
partsupp,
nation,
region
WHERE
p_partkey = ps_partkey
AND s_suppkey = ps_suppkey
AND p_size = 15
AND p_type LIKE '%BRASS'
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'EUROPE'
AND ps_supplycost = (
SELECT
min(ps_supplycost)
FROM
partsupp,
supplier,
nation,
region
WHERE
p_partkey = ps_partkey
AND s_suppkey = ps_suppkey
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'EUROPE')
ORDER BY
s_acctbal DESC,
n_name,
s_name,
p_partkey
LIMIT 100;
SELECT
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) AS revenue,
o_orderdate,
o_shippriority
FROM
customer,
orders,
lineitem
WHERE
c_mktsegment = 'BUILDING'
AND c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate < date'1995-03-15'
AND l_shipdate > date'1995-03-15'
GROUP BY
l_orderkey,
o_orderdate,
o_shippriority
ORDER BY
revenue DESC,
o_orderdate
LIMIT 10;
SELECT
o_orderpriority,
count(*) AS order_count
FROM
orders
WHERE
o_orderdate >= date'1993-07-01'
AND o_orderdate < date'1993-07-01' + interval 3 month
AND EXISTS (
SELECT
*
FROM
lineitem
WHERE
l_orderkey = o_orderkey
AND l_commitdate < l_receiptdate)
GROUP BY
o_orderpriority
ORDER BY
o_orderpriority;
SELECT
n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
FROM
customer,
orders,
lineitem,
supplier,
nation,
region
WHERE
c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND l_suppkey = s_suppkey
AND c_nationkey = s_nationkey
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'ASIA'
AND o_orderdate >= date'1994-01-01'
AND o_orderdate < date'1994-01-01' + interval 1 year
GROUP BY
n_name
ORDER BY
revenue DESC;
SELECT
sum(l_extendedprice * l_discount) AS revenue
FROM
lineitem
WHERE
l_shipdate >= date'1994-01-01'
AND l_shipdate < date'1994-01-01' + interval 1 year
AND l_discount BETWEEN 0.06 - 0.01 AND 0.06 + 0.01
AND l_quantity < 24;
SELECT
supp_nation,
cust_nation,
l_year,
sum(volume) AS revenue
FROM (
SELECT
n1.n_name AS supp_nation,
n2.n_name AS cust_nation,
extract(year FROM l_shipdate) AS l_year,
l_extendedprice * (1 - l_discount) AS volume
FROM
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
WHERE
s_suppkey = l_suppkey
AND o_orderkey = l_orderkey
AND c_custkey = o_custkey
AND s_nationkey = n1.n_nationkey
AND c_nationkey = n2.n_nationkey
AND ((n1.n_name = 'FRANCE'
AND n2.n_name = 'GERMANY')
OR (n1.n_name = 'GERMANY'
AND n2.n_name = 'FRANCE'))
AND l_shipdate BETWEEN date'1995-01-01' AND date'1996-12-31') AS shipping
GROUP BY
supp_nation,
cust_nation,
l_year
ORDER BY
supp_nation,
cust_nation,
l_year;
SELECT
o_year,
sum(
CASE WHEN nation = 'BRAZIL' THEN
volume
ELSE
0
END) / sum(volume) AS mkt_share
FROM (
SELECT
extract(year FROM o_orderdate) AS o_year,
l_extendedprice * (1 - l_discount) AS volume,
n2.n_name AS nation
FROM
part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
WHERE
p_partkey = l_partkey
AND s_suppkey = l_suppkey
AND l_orderkey = o_orderkey
AND o_custkey = c_custkey
AND c_nationkey = n1.n_nationkey
AND n1.n_regionkey = r_regionkey
AND r_name = 'AMERICA'
AND s_nationkey = n2.n_nationkey
AND o_orderdate BETWEEN date'1995-01-01' AND date'1996-12-31'
AND p_type = 'ECONOMY ANODIZED STEEL') AS all_nations
GROUP BY
o_year
ORDER BY
o_year;
SELECT
nation,
o_year,
sum(amount) AS sum_profit
FROM (
SELECT
n_name AS nation,
extract(year FROM o_orderdate) AS o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
FROM
part,
supplier,
lineitem,
partsupp,
orders,
nation
WHERE
s_suppkey = l_suppkey
AND ps_suppkey = l_suppkey
AND ps_partkey = l_partkey
AND p_partkey = l_partkey
AND o_orderkey = l_orderkey
AND s_nationkey = n_nationkey
AND p_name LIKE '%green%') AS profit
GROUP BY
nation,
o_year
ORDER BY
nation,
o_year DESC;
SELECT
c_custkey,
c_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
FROM
customer,
orders,
lineitem,
nation
WHERE
c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate >= date'1993-10-01' AND o_orderdate < date'1993-10-01' + interval 3 month
AND l_returnflag = 'R'
AND c_nationkey = n_nationkey
GROUP BY
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
ORDER BY
revenue DESC
LIMIT 20;
SELECT
ps_partkey,
sum(ps_supplycost * ps_availqty) AS value
FROM
partsupp,
supplier,
nation
WHERE
ps_suppkey = s_suppkey
AND s_nationkey = n_nationkey
AND n_name = 'GERMANY'
GROUP BY
ps_partkey
HAVING
sum(ps_supplycost * ps_availqty) > (
SELECT
sum(ps_supplycost * ps_availqty) * 0.0001000000
FROM
partsupp,
supplier,
nation
WHERE
ps_suppkey = s_suppkey
AND s_nationkey = n_nationkey
AND n_name = 'GERMANY')
ORDER BY
value DESC;
SELECT
l_shipmode,
sum(
CASE WHEN o_orderpriority = '1-URGENT'
OR o_orderpriority = '2-HIGH' THEN
1
ELSE
0
END) AS high_line_count,
sum(
CASE WHEN o_orderpriority <> '1-URGENT'
AND o_orderpriority <> '2-HIGH' THEN
1
ELSE
0
END) AS low_line_count
FROM
orders,
lineitem
WHERE
o_orderkey = l_orderkey
AND l_shipmode IN ('MAIL', 'SHIP')
AND l_commitdate < l_receiptdate
AND l_shipdate < l_commitdate
AND l_receiptdate >= date'1994-01-01' AND l_receiptdate < date'1994-01-01' + interval 1 year
GROUP BY
l_shipmode
ORDER BY
l_shipmode;
SELECT
c_count,
count(*) AS custdist
FROM (
SELECT
c_custkey,
count(o_orderkey)
FROM
customer
LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%special%requests%'
GROUP BY
c_custkey) AS c_orders (c_custkey,
c_count)
GROUP BY
c_count
ORDER BY
custdist DESC,
c_count DESC;
SELECT
100.00 * sum(
CASE WHEN p_type LIKE 'PROMO%' THEN
l_extendedprice * (1 - l_discount)
ELSE
0
END) / sum(l_extendedprice * (1 - l_discount)) AS promo_revenue
FROM
lineitem,
part
WHERE
l_partkey = p_partkey
AND l_shipdate >= date'1995-09-01'
AND l_shipdate < date'1995-09-01' + interval 1 month;
SELECT
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
FROM
supplier,
(
SELECT
l_suppkey AS supplier_no,
sum(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
GROUP BY
supplier_no) revenue0
WHERE
s_suppkey = supplier_no
AND total_revenue = (
SELECT
max(total_revenue)
FROM (
SELECT
l_suppkey AS supplier_no,
sum(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
GROUP BY
supplier_no) revenue1)
ORDER BY
s_suppkey;
SELECT
p_brand,
p_type,
p_size,
count(DISTINCT ps_suppkey) AS supplier_cnt
FROM
partsupp,
part
WHERE
p_partkey = ps_partkey
AND p_brand <> 'Brand#45'
AND p_type NOT LIKE 'MEDIUM POLISHED%'
AND p_size IN (49, 14, 23, 45, 19, 3, 36, 9)
AND ps_suppkey NOT IN (
SELECT
s_suppkey
FROM
supplier
WHERE
s_comment LIKE '%Customer%Complaints%')
GROUP BY
p_brand,
p_type,
p_size
ORDER BY
supplier_cnt DESC,
p_brand,
p_type,
p_size;
SELECT
sum(l_extendedprice) / 7.0 AS avg_yearly
FROM
lineitem,
part
WHERE
p_partkey = l_partkey
AND p_brand = 'Brand#23'
AND p_container = 'MED BOX'
AND l_quantity < (
SELECT
0.2 * avg(l_quantity)
FROM
lineitem
WHERE
l_partkey = p_partkey);
SELECT
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
FROM
customer,
orders,
lineitem
WHERE
o_orderkey IN (
SELECT
l_orderkey
FROM
lineitem
GROUP BY
l_orderkey
HAVING
sum(l_quantity) > 300)
AND c_custkey = o_custkey
AND o_orderkey = l_orderkey
GROUP BY
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice
ORDER BY
o_totalprice DESC,
o_orderdate
LIMIT 100;
SELECT
sum(l_extendedprice * (1 - l_discount)) AS revenue
FROM
lineitem,
part
WHERE (p_partkey = l_partkey
AND p_brand = 'Brand#12'
AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
AND l_quantity >= 1
AND l_quantity <= 1 + 10
AND p_size BETWEEN 1 AND 5
AND l_shipmode IN ('AIR', 'AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON')
OR (p_partkey = l_partkey
AND p_brand = 'Brand#23'
AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
AND l_quantity >= 10
AND l_quantity <= 10 + 10
AND p_size BETWEEN 1 AND 10
AND l_shipmode IN ('AIR', 'AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON')
OR (p_partkey = l_partkey
AND p_brand = 'Brand#34'
AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
AND l_quantity >= 20
AND l_quantity <= 20 + 10
AND p_size BETWEEN 1 AND 15
AND l_shipmode IN ('AIR', 'AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON');
SELECT
s_name,
s_address
FROM
supplier,
nation
WHERE
s_suppkey IN (
SELECT
ps_suppkey
FROM
partsupp
WHERE
ps_partkey IN (
SELECT
p_partkey
FROM
part
WHERE
p_name LIKE 'forest%')
AND ps_availqty > (
SELECT
0.5 * sum(l_quantity)
FROM
lineitem
WHERE
l_partkey = ps_partkey
AND l_suppkey = ps_suppkey
AND l_shipdate >= date'1994-01-01'
AND l_shipdate < date'1994-01-01' + interval 1 year))
AND s_nationkey = n_nationkey
AND n_name = 'CANADA'
ORDER BY
s_name;
SELECT
s_name,
count(*) AS numwait
FROM
supplier,
lineitem l1,
orders,
nation
WHERE
s_suppkey = l1.l_suppkey
AND o_orderkey = l1.l_orderkey
AND o_orderstatus = 'F'
AND l1.l_receiptdate > l1.l_commitdate
AND EXISTS (
SELECT
*
FROM
lineitem l2
WHERE
l2.l_orderkey = l1.l_orderkey
AND l2.l_suppkey <> l1.l_suppkey)
AND NOT EXISTS (
SELECT
*
FROM
lineitem l3
WHERE
l3.l_orderkey = l1.l_orderkey
AND l3.l_suppkey <> l1.l_suppkey
AND l3.l_receiptdate > l3.l_commitdate)
AND s_nationkey = n_nationkey
AND n_name = 'SAUDI ARABIA'
GROUP BY
s_name
ORDER BY
numwait DESC,
s_name
LIMIT 100;
SELECT
cntrycode,
count(*) AS numcust,
sum(c_acctbal) AS totacctbal
FROM (
SELECT
substring(c_phone FROM 1 FOR 2) AS cntrycode,
c_acctbal
FROM
customer
WHERE
substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17')
AND c_acctbal > (
SELECT
avg(c_acctbal)
FROM
customer
WHERE
c_acctbal > 0.00
AND substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17'))
AND NOT EXISTS (
SELECT
*
FROM
orders
WHERE
o_custkey = c_custkey)) AS custsale
GROUP BY
cntrycode
ORDER BY
cntrycode;
各场景下的测试结果对比如下:
测试场景 | SQL执行总耗时(分钟) |
测试场景 | SQL执行总耗时(分钟) |
AnalyticDB for MySQL Spark 3.2.0版本处理明文数据 | |
高性能版Spark全密态计算引擎处理明文数据 | 34 |
高性能版Spark全密态计算引擎处理密文数据 | 37 |
通过上述对比可以得知,在处理大规模数据集时,高性能版Spark全密态计算引擎相较于AnalyticDB for MySQL Spark 3.2.0版本性能提升了0.9倍。高性能版Spark全密态计算引擎在保证数据传输与存储过程安全的同时,也提升了数据处理效率。