This topic describes how to create tables for JSON data in DLA.
Prerequisites
Before creating tables, you must create an OSS schema first. In the example, all tables use the following OSS schema:
CREATE SCHEMA dla_oss_db with DBPROPERTIES(
catalog='oss',
location 'oss://dlaossfile1/dla/';
);
DLA allows you to create tables for JSON files that meet the following conditions:
DLA can only process JSON files in plain text format.
The JSON files are UTF-8-encoded.
In JSON files, each line must contain a complete JSON record.
For example, DLA does not allow you to create a table for a JSON file that contains the following data:
{"id": 123, "name": "jack",
"c3": "2001-02-03 12:34:56"}
{"id": 456, "name": "rose", "c3": "1906-04-18 05:12:00"}
{"id": 789, "name": "tom", "c3": "2001-02-03 12:34:56"}
{"id": 234, "name": "alice", "c3": "1906-04-18 05:12:00"}
You can rewrite the data as follows:
{"id": 123, "name": "jack", "c3": "2001-02-03 12:34:56"}
{"id": 456, "name": "rose", "c3": "1906-04-18 05:12:00"}
{"id": 789, "name": "tom", "c3": "2001-02-03 12:34:56"}
{"id": 234, "name": "alice", "c3": "1906-04-18 05:12:00"}
JSON file without nested data
For example, the OSS instance has a file oss://dlaossfile1/dla/json_table_1.txt
that stores the following data:
{"id": 123, "name": "jack", "c3": "2001-02-03 12:34:56"}
{"id": 456, "name": "rose", "c3": "1906-04-18 05:12:00"}
{"id": 789, "name": "tom", "c3": "2001-02-03 12:34:56"}
{"id": 234, "name": "alice", "c3": "1906-04-18 05:12:00"}
The following is a sample statement for creating a table based on the file:
CREATE EXTERNAL TABLE json_table_1
(id int,
name string,
c3 timestamp
)STORED AS JSON
LOCATION 'oss://dlaossfile1/dla/';
The query result is as follows:
SELECT * FROM dla_oss_db.json_table_1
|----|------|---------------------|
| id | name | c3 |
| 123| jack | 2001-02-03 12:34:560|
| 456| rose | 1906-04-18 05:12:00 |
| 789| tome | 2001-02-03 12:34:56 |
| 234| alice|1906-04-18 05:12:00 |
JSON file with nested data
For example, the JSON file contains the following data:
{
"DocId": "Alibaba",
"User_1": {
"Id": 1234,
"Username": "bob1234",
"Name": "Bob",
"ShippingAddress": {
"Address1": "969 Wenyi West St.",
"Address2": null,
"City": "Hangzhou",
"Province": "Zhejiang"
},
"Orders": [
{
"ItemId": 6789,
"OrderDate": "11/11/2017"
},
{
"ItemId": 4352,
"OrderDate": "12/12/2017"
}
]
}
}
The following is a sample statement for creating a table based on the file:
CREATE EXTERNAL TABLE json_table_2 (
docid string,
user_1 struct<
id:INT,
username:string,
name:string,
shippingaddress:struct<
address1:string,
address2:string,
city:string,
province:string
>,
orders:array<
struct<
itemid:INT,
orderdate:string
>
>
>
)
STORED AS JSON
LOCATION 'oss://dlaossfile1/dla/json_table_2.txt';
The query result is as follows:
select * from json_table_2;
+---------+----------------------------------------------------------------------------------------------------------------+
| docid | user_1 |
+---------+----------------------------------------------------------------------------------------------------------------+
| Alibaba | [1234, bob1234, Bob, [969 Wenyi West St., null, Hangzhou, Zhejiang], [[6789, 11/11/2017], [4352, 12/12/2017]]] |
+---------+----------------------------------------------------------------------------------------------------------------+
Note: No matter whether the JSON file has nested data, a complete JSON record must be placed in one line and then it can be processed by DLA.