本文介绍如何在DLA中为JSON类型的数据创建表。
前提条件
创建文件数据表之前需要先创建OSS Schema,本文示例中所有文件数据表均使用以下OSS Schema。
CREATE SCHEMA dla_oss_db with DBPROPERTIES(
catalog='oss',
location 'oss://dlaossfile1/dla/'
)
JSON文件必须满足以下条件,否则无法在DLA中为JSON文件创建表:
无论JSON数据是否为嵌套格式,必须将一条完整的JSON数据放在一行中,否则无法在DLA中为JSON文件创建表。
JSON文件以纯文本格式存储,例如TXT等。
JSON文件中数据的编码方式是UTF-8。
JSON文件中每一行必须是一个完整的JSON对象。
例如,DLA无法为以下文件创建表:
{"id": 123, "name": "jack",
"c3": "2001-02-03 12:34:56"}
{"id": 456, "name": "rose", "c3": "1906-04-18 05:12:00"}
{"id": 789, "name": "tom", "c3": "2001-02-03 12:34:56"}
{"id": 234, "name": "alice", "c3": "1906-04-18 05:12:00"}
可改写成:
{"id": 123, "name": "jack", "c3": "2001-02-03 12:34:56"}
{"id": 456, "name": "rose", "c3": "1906-04-18 05:12:00"}
{"id": 789, "name": "tom", "c3": "2001-02-03 12:34:56"}
{"id": 234, "name": "alice", "c3": "1906-04-18 05:12:00"}
不含嵌套的JSON数据
例如,OSS中有一个json_table_1.txt文件:oss://dlaossfile1/dla/json_table_1.txt
,存储的数据如下所示。
{"id": 123, "name": "jack", "c3": "2001-02-03 12:34:56"}
{"id": 456, "name": "rose", "c3": "1906-04-18 05:12:00"}
{"id": 789, "name": "tom", "c3": "2001-02-03 12:34:56"}
{"id": 234, "name": "alice", "c3": "1906-04-18 05:12:00"}
DLA中的建表语句为:
CREATE EXTERNAL TABLE json_table_1
(id int,
name string,
c3 timestamp
)STORED AS JSON
LOCATION 'oss://dlaossfile1/dla/';
查询结果为:
SELECT * FROM dla_oss_db.json_table_1
|----|------|---------------------|
| id | name | c3 |
| 123| jack | 2001-02-03 12:34:560|
| 456| rose | 1906-04-18 05:12:00 |
| 789| tome | 2001-02-03 12:34:56 |
| 234| alice|1906-04-18 05:12:00 |
含有嵌套的JSON数据
含有嵌套的JSON数据是指使用struct和array定义的嵌套JSON数据。
例如,以下一条原始JSON数据。
{ "DocId": "Alibaba", "User_1": { "Id": 1234, "Username": "bob1234", "Name": "Bob", "ShippingAddress": { "Address1": "969 Wenyi West St.", "Address2": null, "City": "Hangzhou", "Province": "Zhejiang" }, "Orders": [{ "ItemId": 6789, "OrderDate": "11/11/2017" }, { "ItemId": 4352, "OrderDate": "12/12/2017" } ] } }
为了便于您查看JSON数据,使用在线JSON格式化工具处理后,数据内容为:
{
"DocId": "Alibaba",
"User_1": {
"Id": 1234,
"Username": "bob1234",
"Name": "Bob",
"ShippingAddress": {
"Address1": "969 Wenyi West St.",
"Address2": null,
"City": "Hangzhou",
"Province": "Zhejiang"
},
"Orders": [
{
"ItemId": 6789,
"OrderDate": "11/11/2017"
},
{
"ItemId": 4352,
"OrderDate": "12/12/2017"
}
]
}
}
建表语句为:
CREATE EXTERNAL TABLE json_table_2 (
docid string,
user_1 struct<
id:INT,
username:string,
name:string,
shippingaddress:struct<
address1:string,
address2:string,
city:string,
province:string
>,
orders:array<
struct<
itemid:INT,
orderdate:string
>
>
>
)
STORED AS JSON
LOCATION 'oss://dlaossfile1/dla/json_table_2.txt';
查询结果为:
select * from json_table_2;
+---------+----------------------------------------------------------------------------------------------------------------+
| docid | user_1 |
+---------+----------------------------------------------------------------------------------------------------------------+
| Alibaba | [1234, bob1234, Bob, [969 Wenyi West St., null, Hangzhou, Zhejiang], [[6789, 11/11/2017], [4352, 12/12/2017]]] |
+---------+----------------------------------------------------------------------------------------------------------------+
查询数据时,对于使用struct定义的嵌套结构,可以通过.
进行层次对象引用;对于array定义的数组结构,可以通过[数组下标]
(数组下标从1开始)进行对象引用。
select DocId,
User_1.Id,
User_1.ShippingAddress.Address1,
User_1.Orders[1].ItemId
from json_table_2
where User_1.Username = 'bob1234'
and User_1.Orders[2].OrderDate = '12/12/2017';
+---------+------+--------------------+-------+
| DocId | id | address1 | _col3 |
+---------+------+--------------------+-------+
| Alibaba | 1234 | 969 Wenyi West St. | 6789 |
使用JSON函数处理JSON数据
以下JSON数据中,keyvalue_string
对应的value为嵌套字符。
{"data_key":"com.taobao.vipserver.domains.meta.biz.alibaba.com","ts":1524550275112,"value_string":"{\"appName\":\"\",\"apps\":[],\"checksum\":\"50fa0540b430904ee78dff07c7350e1c\",\"clusterMap\":{\"DEFAULT\":{\"defCkport\":80,\"defIPPort\":80,\"healthCheckTask\":null,\"healthChecker\":{\"checkCode\":200,\"curlHost\":\"\",\"curlPath\":\"/status.taobao\",\"type\":\"HTTP\"},\"name\":\"DEFAULT\",\"nodegroup\":\"\",\"sitegroup\":\"\",\"submask\":\"0.0.0.0/0\",\"syncConfig\":{\"appName\":\"trade-ma\",\"nodegroup\":\"tradema\",\"pubLevel\":\"publish\",\"role\":\"\",\"site\":\"\"},\"useIPPort4Check\":true}},\"disabledSites\":[],\"enableArmoryUnit\":false,\"enableClientBeat\":false,\"enableHealthCheck\":true,\"enabled\":true,\"envAndSites\":\"\",\"invalidThreshold\":0.6,\"ipDeleteTimeout\":1800000,\"lastModifiedMillis\":1524550275107,\"localSiteCall\":true,\"localSiteThreshold\":0.8,\"name\":\"biz.alibaba.com\",\"nodegroup\":\"\",\"owners\":[\"junlan.zx\",\"张三\",\"李四\",\"cui.yuanc\"],\"protectThreshold\":0,\"requireSameEnv\":false,\"resetWeight\":false,\"symmetricCallType\":null,\"symmetricType\":\"warehouse\",\"tagName\":\"ipGroup\",\"tenantId\":\"\",\"tenants\":[],\"token\":\"1cf0ec0c771321bb4177182757a67fb0\",\"useSpecifiedURL\":false}"}
为了便于您查看JSON数据,使用在线JSON格式化工具处理后,数据内容为:
{
"data_key": "com.taobao.vipserver.domains.meta.biz.alibaba.com",
"ts": 1524550275112,
"value_string": "{\"appName\":\"\",\"apps\":[],\"checksum\":\"50fa0540b430904ee78dff07c7350e1c\",\"clusterMap\":{\"DEFAULT\":{\"defCkport\":80,\"defIPPort\":80,\"healthCheckTask\":null,\"healthChecker\":{\"checkCode\":200,\"curlHost\":\"\",\"curlPath\":\"/status.taobao\",\"type\":\"HTTP\"},\"name\":\"DEFAULT\",\"nodegroup\":\"\",\"sitegroup\":\"\",\"submask\":\"0.0.0.0/0\",\"syncConfig\":{\"appName\":\"trade-ma\",\"nodegroup\":\"tradema\",\"pubLevel\":\"publish\",\"role\":\"\",\"site\":\"\"},\"useIPPort4Check\":true}},\"disabledSites\":[],\"enableArmoryUnit\":false,\"enableClientBeat\":false,\"enableHealthCheck\":true,\"enabled\":true,\"envAndSites\":\"\",\"invalidThreshold\":0.6,\"ipDeleteTimeout\":1800000,\"lastModifiedMillis\":1524550275107,\"localSiteCall\":true,\"localSiteThreshold\":0.8,\"name\":\"biz.alibaba.com\",\"nodegroup\":\"\",\"owners\":[\"junlan.zx\",\"张三\",\"李四\",\"cui.yuanc\"],\"protectThreshold\":0,\"requireSameEnv\":false,\"resetWeight\":false,\"symmetricCallType\":null,\"symmetricType\":\"warehouse\",\"tagName\":\"ipGroup\",\"tenantId\":\"\",\"tenants\":[],\"token\":\"1cf0ec0c771321bb4177182757a67fb0\",\"useSpecifiedURL\":false}"
}
建表语句为:
CREATE external TABLE json_table_3 (
data_key string,
ts bigint,
value_string string
)
STORED AS JSON
LOCATION 'oss://dlaossfile1/dla/json_table_3.txt';
查询结果为:
查询数据时,可以使用json_parse()
,json_extract_scalar()
,json_extract()
等常用JSON函数对数据进行处理。
select json_extract_scalar(json_parse(value_string),'$.owners[1]') from json_table_3
+--------+
| _col0 |
+--------+
| 张三 |
select json_extract_scalar(json_obj.json_col, '$.DEFAULT.submask')
from (
select json_extract(json_parse(value_string), '$.clusterMap') as json_col from json_table_3
) json_obj
where json_extract_scalar(json_obj.json_col, '$.DEFAULT.healthChecker.curlPath') = '/status.taobao';
+-----------+
| _col0 |
+-----------+
| 0.0.0.0/0 |
with json_obj as (select json_extract(json_parse(value_string), '$.clusterMap') as json_col from json_table_3)
select json_extract_scalar(json_obj.json_col, '$.DEFAULT.submask')
from json_obj
where json_extract_scalar(json_obj.json_col, '$.DEFAULT.healthChecker.curlPath') = '/status.taobao';
+-----------+
| _col0 |
+-----------+
| 0.0.0.0/0 |