This topic describes how to use DataWorks Data Integration to migrate JSON data from Object Storage Service (OSS) to MaxCompute and use the GET_JSON_OBJECT function to extract JSON objects.
Prerequisites
A workflow is created in DataWorks. In this topic, a workflow is created in a workspace that is in basic mode. For more information, see Create a workflow.
A
TXT
file that contains JSON data is uploaded to an OSS bucket. In this example, the OSS bucket is in the China (Shanghai) region. The TXT file contains the following JSON data:{ "store": { "book": [ { "category": "reference", "author": "Nigel Rees", "title": "Sayings of the Century", "price": 8.95 }, { "category": "fiction", "author": "Evelyn Waugh", "title": "Sword of Honour", "price": 12.99 }, { "category": "fiction", "author": "J. R. R. Tolkien", "title": "The Lord of the Rings", "isbn": "0-395-19395-8", "price": 22.99 } ], "bicycle": { "color": "red", "price": 19.95 } }, "expensive": 10 }
Migrate JSON data from OSS to MaxCompute
Add an OSS data source. For more information, see Add an OSS data source.
Create a table in DataWorks to store the JSON data that you want to migrate from OSS.
Login DataWorks console.
In the left-side navigation pane, click Workspaces.
In the top navigation bar, select the region where the desired workspace resides. On the Workspaces page, find the workspace and click DataStudio in the Actions column.
On the DataStudio page, move the pointer over the icon and choose Create Table > MaxCompute > Table.
In the Create Table dialog box, configure the Path and Name parameters and click Create.
NoteIf multiple MaxCompute compute engines are associated with the current workspace as compute engine instances, you must select a MaxCompute compute engine instance from the Engine Instance drop-down list in the Create Table dialog box.
On the table editing page, click DDL Statement.
In the DDL dialog box, enter the following CREATE TABLE statement and click Generate Table Schema:
create table mqdata (mq_data string);
In the Confirm message, click OK.
After the table schema is generated, configure the Display Name parameter in the General section and click Commit to Development Environment and Commit to Production Environment.
NoteIf you use a workspace in basic mode, you need to only click Commit to Production Environment.
Create a batch synchronization node.
Go to the data analytics page. Right-click the specified workflow and choose .
In create a node dialog box, enter node name, and click submit.
In the top navigation bar, choose icon.
In script mode, click icon.
In import Template dialog box SOURCE type, data source, target type and data source, and click confirm.
Modify JSON code and click the icon.
Sample code:
{ "type": "job", "steps": [ { "stepType": "oss", "parameter": { "fieldDelimiterOrigin": "^", "nullFormat": "", "compress": "", "datasource": "OSS_userlog", "column": [ { "name": 0, "type": "string", "index": 0 } ], "skipHeader": "false", "encoding": "UTF-8", "fieldDelimiter": "^", "fileFormat": "binary", "object": [ "applog.txt" ] }, "name": "Reader", "category": "reader" }, { "stepType": "odps", "parameter": { "partition": "", "isCompress": false, "truncate": true, "datasource": "odps_first", "column": [ "mqdata" ], "emptyAsNull": false, "table": "mqdata" }, "name": "Writer", "category": "writer" } ], "version": "2.0", "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] }, "setting": { "errorLimit": { "record": "" }, "speed": { "concurrent": 2, "throttle": false } } }
Check the result
Create an ODPS SQL node.
Right-click the workflow and choose .
In create a function dialog box, enter function name, click submit.
On the configuration tab of the ODPS SQL node, enter the following statements:
-- Query data in the mqdata table. SELECT * from mqdata; -- Obtain the value of the expensive field. SELECT GET_JSON_OBJECT(mqdata.MQdata,'$.expensive') FROM mqdata;
Click icon to run the code.
You can operation Log view the results.