×
Community Blog How to Migrate JSON Data from MongoDB to MaxCompute

How to Migrate JSON Data from MongoDB to MaxCompute

This article describes how to use Data Integration on the DataWorks console to extract JSON fields from ApsaraDB for MongoDB to MaxCompute.

In this article, we will learn how to use the Data Integration function on Alibaba Cloud DataWorks console to extract JSON fields from MongoDB to MaxCompute.

Prepare Data and Account

First, upload the data to your MongoDB database. This example uses Alibaba Cloud's ApsaraDB for MongoDB. The network type is VPC because a public IP address is required for MongoDB to communicate with the default resource group of DataWorks. The test data is as follows:

{
    "store": {
        "book": [
             {
                "category": "reference",
                "author": "Nigel Rees",
                "title": "Sayings of the Century",
                "price": 8.95
             },
             {
                "category": "fiction",
                "author": "Evelyn Waugh",
                "title": "Sword of Honour",
                "price": 12.99
             },
             {
                 "category": "fiction",
                 "author": "J. R. R. Tolkien",
                 "title": "The Lord of the Rings",
                 "isbn": "0-395-19395-8",
                 "price": 22.99
             }
          ],
          "bicycle": {
              "color": "red",
              "price": 19.95
          }
    },
    "expensive": 10
}

Log on to the DMS for MongoDB console. In this example, the database is admin and the collection is userlog. You can run the db.userlog.find().limit(10) command in the Query Window to view the uploaded data, as shown in the following figure.

1

Create a user in the database in advance to add data sources in DataWorks. In this example, run the db.createUser({user:"bookuser",pwd:"123456",roles:["root"]}) command to create a user named bookuser. The password of the user is 123456, and the permission is root.

Use DataWorks to Extract Data to MaxCompute

  1. Add a MongoDB data source

    In the DataWorks console, go to the Data Integration page and add a MongoDB data source.

    2

    For specific parameters, see the following figure. Click Finish after the data source connectivity test is successful. In this example, the MongoDB network type is VPC. Therefore, set the Data Source Type to Has Public IP Address.

    3

    To retrieve the endpoint and the port number, log on to the ApsaraDB for MongDB console and click an instance, as shown in the following figure.

    4

  2. Create a data synchronization task

    In the DataWorks console, create a data synchronization node.

    5

    Meanwhile, create a table named mqdata in DataWorks to store JSON data.

    6

    You can set the table parameters on the graphic interface. In this example, the mqdata table has only one column named MQ data, whose data type is string.

    7

    After creating the table, set the data synchronization task parameters on the graphic interface, as shown in the following figure. Set the target data source to odps_first and the target table to mqdata. Set the original data source to MongoDB and select mongodb_userlog. After completing the preceding configuration, click Switch to Script Mode.

    8

    The following shows the example code in script mode:

    {
        "type": "job",
        "steps": [
            {
                "stepType": "mongodb",
                "parameter": {
                    "datasource": "mongodb_userlog",
     //Indicates the data source name.
                    "column": [
                        {
                            "name": "store.bicycle.color", //Indicates the JSON field path. In this example, the value of color is extracted.
                            "type": "document.document.string" //Indicates the number of fields in this line must be consistent with that in the preceding line (the name line). If the JSON field is a level 1 field, such as the "expensive" field in this example, enter "string" for this field.
                        }
                    ],
                    "collectionName // Collection name": "userlog"
                },
                "name": "Reader",
                "category": "reader"
            },
            {
                "stepType": "odps",
                "parameter": {
                    "partition": "",
                    "isCompress": false,
                    "truncate": true,
                    "datasource": "odps_first",
                    "column": [
         //Indicates the table column name in MaxCompute, namely "mqdata".
                    ],
                    "emptyAsNull": false,
                    "table": "mqdata"
                },
                "name": "Writer",
                "category": "writer"
            }
        ],
        "version": "2.0",
        "order": {
            "hops": [
                {
                    "from": "Reader",
                    "to": "Writer"
                }
            ]
        },
        "setting": {
            "errorLimit": {
                "record": ""
            },
            "speed": {
                "concurrent": 2,
                "throttle": false,
                "dmu": 1
            }
        }
    }

    After completing the preceding configuration, click Run. If the operation is successful, the following log is displayed.

    9

Verify Results

Create an ODPS SQL node in your Business Flow.

10

Enter the SELECT*from mqdata; statement to view the data in the mqdata table. Alternatively, you can run this command on the MaxCompute client for the same purpose.

11

To learn more about Data Migration on Alibaba Cloud MaxCompute, visit https://www.alibabacloud.com/help/doc-detail/98009.html

0 0 0
Share on

Alibaba Cloud MaxCompute

137 posts | 20 followers

You may also like

Comments