All Products
Search
Document Center

DataWorks:RestAPI data source

Last Updated:Dec 27, 2024

DataWorks provides RestAPI Reader and RestAPI Writer for you to read data from and write data to RestAPI data sources. This topic describes the capabilities of synchronizing data from or to RestAPI data sources.

Limits

  • RestAPI data sources support only exclusive resource groups for Data Integration.

  • DataWorks does not allow you to configure a timeout period when you use this type of data source. The built-in timeout period for a request in DataWorks is 60 seconds. If the time required to return the result of your API call exceeds 60 seconds, your task may fail.

Data type mappings

Category

RestAPI data type

Integer

LONG and INT

String

STRING

Floating point

DOUBLE and FLOAT

Boolean value

BOOLEAN

Date and time

DATE

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configure a batch synchronization task to synchronize data of a single table

FAQ

  1. Can I specify only the number of times of page flipping for a response?

    Yes, you can specify only the number of times of page flipping for a response.

  2. Can I configure automatic page flipping for a response?

    No, you cannot configure automatic page flipping for a response. If you configure automatic page flipping for a response, page flipping is stopped when the required data is returned. In this case, sharding cannot be performed on the data.

  3. The specified number of times of page flipping for a response is greater than the actual number of pages for the response. As a result, additional pages do not contain data. How does the system resolve this issue?

    If no result is returned for the SQL query, additional pages do not contain data. In this case, the system continues to query the next data record.

  4. Can RestAPI Reader parse only one level of data in the JSON-formatted response?

    Yes, RestAPI Reader can parse only one level of data in the JSON-formatted response.

  5. How do I configure RestAPI Reader to read data of a non-array type?

    Make sure that the dataPath parameter is set to a path that points to data of a non-array type when you configure the parameter field for RestAPI Reader. This can help RestAPI Reader correctly locate the fields from which you want to read data. For example, you can configure dataPath:"data.list". In addition, set the dataMode parameter to multiData. This way, DataWorks processes the data of a non-array type as multiple separate data records.

    Note

    If you set the dataMode parameter to multiData, the column parameter does not take effect. You must directly specify the path of data that you want to read in the dataPath parameter.

    The following code provides a configuration example:

    reader: {
      name: "restapi",
      parameter: {
        dataPath: "data.list",
        dataMode: "multiData",
        // Other parameters
      }
    }

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Code for RestAPI Reader

{
    "type":"job",
    "version":"2.0",
    "steps":[
        {
            "stepType":"restapi",
            "parameter":{
                "url":"http://127.0.0.1:5000/get_array5",
                "dataMode":"oneData",
                "responseType":"json",
                "column":[
                    {
                        "type":"long",
                        "name":"a.b"  // Query data in the a.b path.
                    },
                    {
                        "type":"string",  // Query data in the a.c path.
                        "name":"a.c"
                    }
                ],
                "dirtyData":"null",
                "method":"get",
                "defaultHeader":{
                    "X-Custom-Header":"test header"
                },
                "customHeader":{
                    "X-Custom-Header2":"test header2"
                },
                "parameters":"abc=1&def=1"
            },
            "name":"restapireader",
            "category":"reader"
        },
        {
            "stepType":"stream",
            "parameter":{

            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":""
        },
        "speed":{
            "throttle":true,  // Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1,  // The maximum number of parallel threads.  
            "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Take note of the following information when you configure RestAPI Reader by using the code editor:

After RestAPI Reader sends an HTTP or HTTPS request, a JSON-formatted response is returned. The dataPath parameter is used to specify the path of the JSON-formatted data record or JSON array that is queried. Examples:


In the following sample response, a JSON array is returned for the DATA parameter that contains the business data.
{
    "HEADER": {
        "BUSID": "bid1",
        "RECID": "uuid",
        "SENDER": "dc",
        "RECEIVER": "pre",
        "DTSEND": "202201250000"
    },
    "DATA": [
        {
            "SERNR": "sernr1"
        },
        {
            "SERNR": "sernr2"
        }
    ]
}

To extract multiple data records from the JSON array and transfer the data records to a writer, you must configure the column parameter in the "column": [ "SERNR" ] format, the dataMode parameter in the "dataMode": "multiData" format, and the dataPath parameter in the "dataPath": "DATA" format.


In the following sample response, a JSON object is returned for the content.DATA parameter that contains the business data.
{
    "HEADER": {
        "BUSID": "bid1",
        "RECID": "uuid",
        "SENDER": "dc",
        "RECEIVER": "pre",
        "DTSEND": "202201250000"
    },
    "content": {
        "DATA": {
            "SERNR": "sernr2"
        }
    }
}

To extract one data record from the JSON object and transfer the data record to a Writer, you must configure the column parameter in the "column": [ "SERNR" ] format, the dataMode parameter in the "dataMode": "oneData" format, and the dataPath parameter in the "dataPath": "content.DATA" format.
                

Parameters in code for RestAPI Reader

Note

You need to configure the parameters that are described in the following table when you configure a batch synchronization task for a RestAPI data source.

Scheduling parameters are not supported for a batch synchronization task that uses RestAPI Reader.

Parameter

Description

Required

Default value

url

The URL of the RESTful API.

Yes

No default value

dataMode

The method that RestAPI Reader uses to read data from the JSON-formatted response returned by the RESTful API. Valid values:

  • oneData: RestAPI Reader extracts one data record.

  • multiData: RestAPI Reader extracts a JSON array and transfers multiple data records to a writer.

Yes

No default value

responseType

The format of the response returned by the RESTful API. Only the JSON format is supported.

Yes

JSON

column

The names of the fields from which you want to read data. The type parameter specifies the data type of a field. The name parameter specifies the JSON-formatted path in which the field is located. You can configure the column parameter in the following format:

"column":[{"type":"long","name":"a.b" // Query data in the a.b path.},{"type":"string","name":"a.c"// Query data in the a.c path.}]

You must configure the type and name parameters for each field.

Yes

No default value

dataPath

The path of the JSON-formatted data record or JSON array that is queried.

No

No default value

method

The request method. Valid values: get and post.

Yes

No default value

customHeader

The header information transferred to the RESTful API.

No

No default value

parameters

The parameter information transferred to the RESTful API.

  • If the method parameter is set to get, set the value to abc=1&def=1.

  • If the method parameter is set to post, configure JSON parameters.

No

No default value

dirtyData

The processing mechanism that is used when no data is found in the JSON-formatted path specified by using the column parameter. Valid values:

  • dirty: If a specific data record cannot be found in the specified JSON-formatted path, this data record is considered as a dirty data record.

  • null: If a specific data record cannot be found in the specified JSON-formatted path, the column parameter is set to null.

Yes

dirty

requestTimes

The number of times RestAPI Reader requests to read data from the response returned by the RESTful API. Valid values:

  • single: only once

  • multiple: multiple times

Yes

single

requestParam

If you set the requestTimes parameter to multiple, you must configure a parameter that you want to repeatedly pass to the RESTful API in each request. For example, if you configure the pageNumber parameter, RestAPI Reader passes the pageNumber parameter to the RESTful API based on the settings of the startIndex, endIndex, and step parameters.

No

No default value

startIndex

The start point of requests. The data at the start point is also requested.

No

No default value

endIndex

The end point of requests. The data at the end point is also requested.

No

No default value

step

The step at which requests are sent.

No

No default value

authType

The authentication method. Valid values:

  • Basic Auth: basic authentication

    If the data source supports username- and password-based authentication, you can select Basic Auth and configure the username and password that can be used for authentication. During data integration, the username and password are transferred to the RESTful API URL for authentication. The data source is connected only after the authentication is successful.

  • Token Auth: token-based authentication

    If the data source supports token-based authentication, you can select Token Auth and configure a fixed token value that can be used for authentication. During data integration, the token is contained in the request header, such as {"Authorization":"Bearer TokenXXXXXX"}, and transferred to the RESTful API URL for authentication. The data source is connected only after the authentication is successful.

  • Aliyun API Signature: Alibaba Cloud API signature-based authentication

    If the following conditions are met, you can select Aliyun API Signature and configure the AccessKey ID and AccessKey secret that can be used for authentication: The data source that you want to connect is an Alibaba Cloud service, and the API of this service supports AccessKey pair-based authentication.

No

No default value

authUsername/authPassword

The username and password used for basic authentication.

No

No default value

authToken

The token used for token-based authentication.

No

No default value

accessKey/accessSecret

The AccessKey pair used for authentication based on Alibaba Cloud API signature.

No

No default value

Code for RestAPI Writer

{
    "type":"job",
    "version":"2.0",
    "steps":[
        {
            "stepType":"stream",
            "parameter":{

            },
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"restapi",
            "parameter":{
                "url":"http://127.0.0.1:5000/writer1",
                "dataMode":"oneData",
                "responseType":"json",
                "column":[
                    {
                        "type":"long", // Store data in the a.b path.
                        "name":"a.b"
                    },
                    {
                        "type":"string", // Store data in the a.c path.
                        "name":"a.c"
                    }
                ],
                "method":"post",
                "defaultHeader":{
                    "X-Custom-Header":"test header"
                },
                "customHeader":{
                    "X-Custom-Header2":"test header2"
                },
                "parameters":"abc=1&def=1",
                "batchSize":256
            },
            "name":"restapiwriter",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0" // The maximum number of dirty data records allowed. 
        },
        "speed":{
            "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Parameters in code for RestAPI Writer

Parameter

Description

Required

Default value

url

The URL of the RESTful API.

Yes

No default value

dataMode

The format in which RestAPI Writer transfers JSON-formatted data.

  • oneData: RestAPI Writer transfers one data record in each request.

  • multiData: RestAPI Writer transfers multiple data records in each request. The number of requests is determined by the number of tasks generated by the Reader.

Yes

No default value

column

The columns to which you want to write the generated JSON-formatted data. The type field specifies the data type of a column. The name field specifies the JSON-formatted path where the column is stored. You can configure the column parameter in the following format:

"column":[{"type":"long","name":"a.b" // Store data in the a.b path.},{"type":"string","name":"a.c"// Store data in the a.c path.}]

Note

You must configure the type and name parameters for each field.

Yes

No default value

dataPath

The path that is used to store the JSON-formatted data.

No

No default value

method

The request method. Valid values: post and put.

Yes

No default value

customHeader

The header information transferred to the RESTful API.

No

No default value

authType

The authentication method. Valid Values:

  • Basic Auth: basic authentication

    If the data source supports username- and password-based authentication, you can select Basic Auth and configure the username and password that can be used for authentication. During data integration, the username and password are transferred to the RESTful API URL for authentication. The data source is connected only after the authentication is successful.

  • Token Auth: token-based authentication

    If the data source supports token-based authentication, you can select Token Auth and configure a fixed token value that can be used for authentication. During data integration, the token is contained in the request header, such as {"Authorization":"Bearer TokenXXXXXX"}, and transferred to the RESTful API URL for authentication. The data source is connected only after the authentication is successful.

  • Aliyun API Signature: Alibaba Cloud API signature-based authentication

    If the following conditions are met, you can select Aliyun API Signature and configure the AccessKey ID and AccessKey secret that can be used for authentication: The data source that you want to connect is an Alibaba Cloud service, and the API of this service supports AccessKey pair-based authentication.

No

No default value

authUsername/authPassword

The username and password used for basic authentication.

No

No default value

authToken

The token used for token-based authentication.

No

No default value

accessKey/accessSecret

The AccessKey pair used for authentication based on Alibaba Cloud API signature.

No

No default value

batchSize

The maximum number of data records that can be transferred in each request when the dataMode parameter is set to multiData.

Yes

512