DataWorks provides RestAPI Reader and RestAPI Writer for you to read data from and write data to RestAPI data sources. This topic describes the capabilities of synchronizing data from or to RestAPI data sources.
Limits
RestAPI data sources support only exclusive resource groups for Data Integration.
DataWorks does not allow you to configure a timeout period when you use this type of data source. The built-in timeout period for a request in DataWorks is 60s. If the time required to return the result of your API call exceeds 60s, your task may fail.
Data type mappings
Category | RestAPI data type |
Integer | LONG and INT |
String | STRING |
Floating point | DOUBLE and FLOAT |
Boolean | BOOLEAN |
Date and time | DATE |
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a data synchronization task, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.
Add a data source
Before you configure a data synchronization task to synchronize data from or to a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.
Configure a batch synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.
FAQ
Can I specify only the number of times of page flipping for a response?
Yes, you can specify only the number of times of page flipping for a response.
Can I configure automatic page flipping for a response?
No, you cannot configure automatic page flipping for a response. If you configure automatic page flipping for a response, page flipping is stopped when the required data is returned. In this case, sharding cannot be performed on the data.
The specified number of times of page flipping for a response is greater than the actual number of pages for the response. As a result, additional pages do not contain data. How does the system resolve this issue?
If no result is returned for the SQL query, additional pages do not contain data. In this case, the system continues to query the next data record.
Can RestAPI Reader parse only one level of data in the JSON-formatted response?
Yes, RestAPI Reader can parse only one level of data in the JSON-formatted response.
Appendix: Code and parameters
Appendix: Configure a batch synchronization task by using the code editor
If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader and writer of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader and writer in the code editor.
Code for RestAPI Reader
{
"type":"job",
"version":"2.0",
"steps":[
{
"stepType":"restapi",
"parameter":{
"url":"http://127.0.0.1:5000/get_array5",
"dataMode":"oneData",
"responseType":"json",
"column":[
{
"type":"long",
"name":"a.b" // Query data in the a.b path.
},
{
"type":"string", // Query data in the a.c path.
"name":"a.c"
}
],
"dirtyData":"null",
"method":"get",
"defaultHeader":{
"X-Custom-Header":"test header"
},
"customHeader":{
"X-Custom-Header2":"test header2"
},
"parameters":"abc=1&def=1"
},
"name":"restapireader",
"category":"reader"
},
{
"stepType":"stream",
"parameter":{
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":""
},
"speed":{
"throttle":true, // Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent":1, // The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate. Unit: MB/s.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}
Take note of the following information when you configure RestAPI Reader by using the code editor:
After RestAPI Reader sends an HTTP or HTTPS request, a JSON-formatted response is returned. The dataPath parameter is used to specify the path of the JSON-formatted data record or JSON array that is queried. Examples: Examples:
In the following sample response, a JSON array is returned for the DATA parameter that contains the business data.
{
"HEADER": {
"BUSID": "bid1",
"RECID": "uuid",
"SENDER": "dc",
"RECEIVER": "pre",
"DTSEND": "202201250000"
},
"DATA": [
{
"SERNR": "sernr1"
},
{
"SERNR": "sernr2"
}
]
}
To extract multiple data records from the JSON array and transfer the data records to a writer, you must configure the column parameter in the "column": [ "SERNR" ] format, the dataMode parameter in the "dataMode": "multiData" format, and the dataPath parameter in the "dataPath": "DATA" format.
In the following sample response, a JSON object is returned for the content.DATA parameter that contains the business data.
{
"HEADER": {
"BUSID": "bid1",
"RECID": "uuid",
"SENDER": "dc",
"RECEIVER": "pre",
"DTSEND": "202201250000"
},
"content": {
"DATA": {
"SERNR": "sernr2"
}
}
}
To extract one data record from the JSON object and transfer the data record to a Writer, you must configure the column parameter in the "column": [ "SERNR" ] format, the dataMode parameter in the "dataMode": "oneData" format, and the dataPath parameter in the "dataPath": "content.DATA" format.
Parameters in code for RestAPI Reader
You need to configure the parameters that are described in the following table when you configure a batch synchronization task for a RestAPI data source.
Scheduling parameters are not supported for a batch synchronization task that uses RestAPI Reader.
Parameter | Description | Required | Default value |
url | The URL of the RESTful API. | Yes | No default value |
dataMode | The method that RestAPI Reader uses to read data from the JSON-formatted response returned by the RESTful API. Valid values:
| Yes | No default value |
responseType | The format of the response returned by the RESTful API. Only the JSON format is supported. | Yes | JSON |
column | The names of the fields from which you want to read data. The type parameter specifies the data type of a field. The name parameter specifies the JSON-formatted path in which the field is located. You can configure the column parameter in the following format: "column":[{"type":"long","name":"a.b" // Query data in the a.b path.},{"type":"string","name":"a.c"// Query data in the a.c path.}] You must configure the type and name parameters for each field. | Yes | No default value |
dataPath | The path of the JSON-formatted data record or JSON array that is queried. | No | No default value |
method | The request method. Valid values: get and post. | Yes | No default value |
customHeader | The header information transferred to the RESTful API. | No | No default value |
parameters | The parameter information transferred to the RESTful API.
| No | No default value |
dirtyData | The processing mechanism that is used when no data is found in the JSON-formatted path specified by using the column parameter. Valid values:
| Yes | dirty |
requestTimes | The number of times RestAPI Reader requests to read data from the response returned by the RESTful API. Valid values:
| Yes | single |
requestParam | If you set the requestTimes parameter to multiple, you must configure a parameter that you want to repeatedly pass to the RESTful API in each request. For example, if you configure the pageNumber parameter, RestAPI Reader passes the pageNumber parameter to the RESTful API based on the settings of the startIndex, endIndex, and step parameters. | No | No default value |
startIndex | The start point of requests. The data at the start point is also requested. | No | No default value |
endIndex | The end point of requests. The data at the end point is also requested. | No | No default value |
step | The step at which requests are sent. | No | No default value |
authType | The authentication method. Valid values:
| No | No default value |
authUsername/authPassword | The username and password used for basic authentication. | No | No default value |
authToken | The token used for token-based authentication. | No | No default value |
accessKey/accessSecret | The AccessKey pair used for authentication based on Alibaba Cloud API signature. | No | No default value |
Code for RestAPI Writer
{
"type":"job",
"version":"2.0",
"steps":[
{
"stepType":"stream",
"parameter":{
},
"name":"Reader",
"category":"reader"
},
{
"stepType":"restapi",
"parameter":{
"url":"http://127.0.0.1:5000/writer1",
"dataMode":"oneData",
"responseType":"json",
"column":[
{
"type":"long", // Store data in the a.b path.
"name":"a.b"
},
{
"type":"string", // Store data in the a.c path.
"name":"a.c"
}
],
"method":"post",
"defaultHeader":{
"X-Custom-Header":"test header"
},
"customHeader":{
"X-Custom-Header2":"test header2"
},
"parameters":"abc=1&def=1",
"batchSize":256
},
"name":"restapiwriter",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0" // The maximum number of dirty data records allowed.
},
"speed":{
"throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent":1, // The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate. Unit: MB/s.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}
Parameters in code for RestAPI Writer
Parameter | Description | Required | Default value |
url | The URL of the RESTful API. | Yes | No default value |
dataMode | The format in which RestAPI Writer transfers JSON-formatted data.
| Yes | No default value |
column | The columns to which you want to write the generated JSON-formatted data. The type field specifies the data type of a column. The name field specifies the JSON-formatted path where the column is stored. You can configure the column parameter in the following format: "column":[{"type":"long","name":"a.b" // Store data in the a.b path.},{"type":"string","name":"a.c"// Store data in the a.c path.}] Note You must configure the type and name parameters for each field. | Yes | No default value |
dataPath | The path that is used to store the JSON-formatted data. | No | No default value |
method | The request method. Valid values: post and put. | Yes | No default value |
customHeader | The header information transferred to the RESTful API. | No | No default value |
authType | The authentication method. Valid values:
| No | No default value |
authUsername/authPassword | The username and password used for basic authentication. | No | No default value |
authToken | The token used for token-based authentication. | No | No default value |
accessKey/accessSecret | The AccessKey pair used for authentication based on Alibaba Cloud API signature. | No | No default value |
batchSize | The maximum number of data records that can be transferred in each request when the dataMode parameter is set to multiData. | Yes | 512 |