DataWorks Data Integration supports HttpFile data sources. You can download files over HTTP and synchronize the files to a destination data source.
Limits
HttpFile data sources support only exclusive resource groups for Data Integration.
Data type mappings
Category | Description |
STRING | Text. |
LONG | Integer. |
BYTES | Byte array. The text that is read is converted to a byte array. The encoding format is UTF-8. |
BOOL | Boolean. |
DOUBLE | Decimal. |
DATE | Date and time. The following date and time formats are supported:
|
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a data synchronization task, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.
Add a data source
Before you configure a data synchronization task to synchronize data from a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.
Configure a batch synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.
Appendix: Code and parameters
Configure a batch synchronization task by using the code editor
If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader in the code editor.
Code for HttpFile Reader
In the following code, a synchronization task is configured to read data from an HttpFile file:
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "httpfile",
"parameter": {
"datasource": "",
"fileName": "/f/z/1.csv",
"requestMethod": "GET",
"requestBody": "",
"requestHeaders": {
"header1": "v1",
"header2": "v2"
},
"socketTimeoutSeconds": 3600,
"connectTimeoutSeconds": 60,
"bufferByteSizeInKB": 1024,
"fileFormat": "csv",
"encoding": "utf8/gbk/...",
"fieldDelimiter": ",",
"useMultiCharDelimiter": true,
"lineDelimiter": "\n",
"skipHeader": true,
"compress": "zip/gzip",
"column": [
{
"index": 0,
"type": "long"
},
{
"index": 1,
"type": "boolean"
},
{
"index": 2,
"type": "double"
},
{
"index": 3,
"type": "string"
},
{
"index": 4,
"type": "date"
}
]
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"concurrent": 1
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Parameters in code for HttpFile Reader
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. | Yes | No default value |
fileName | The file path. If the file name contains special characters, you must enter the value after the URL escape. For example, you must escape a space to %20. Original file path: Value of this parameter: Note
| Yes | No default value |
bufferByteSizeInKB | The buffer size of the downloaded file. Unit: KB. | No | 1024 |
requestMethod | The request method. Valid values: GET, POST, and PUT. | No | GET |
requestParam | This parameter takes effect only when the requestMethod parameter is set to GET. If the parameter value contains special characters, the parameter value must be escaped. Example: The value of the start parameter is The value of this parameter is Note The start parameter specifies the start time of an operation when a GET request is initiated. | No | No default value |
requestBody | The content of the request. This parameter takes effect only when the requestMethod parameter is set to POST or PUT. This parameter must be used with the Content-Type parameter in requestHeaders. Example:
| No | No default value |
requestHeaders | The request header, which is specified in a key-value pair. Example:
| No |
|
fileFormat | The type of the source file. Valid values: csv and text. You can specify delimiters for the two types of files. | No | No default value |
column | The names of the columns from which you want to read data.
By default, the reader reads all data as strings based on the following configuration:
You can also configure the column parameter in the following way:
Note For the column parameter, you must configure the type parameter and either the index or value parameter. You are not allowed to configure the three parameters at the same time. | Yes | "column": ["*"] |
fieldDelimiter | The column delimiter that is used in the file from which you want to read data. Note You must specify a column delimiter for HttpFile Reader. The default column delimiter is commas (,). If you do not specify a column delimiter, the default column delimiter is used. If the delimiter is non-printable, enter a value encoded in Unicode, such as \u001b and \u007c. | Yes | , |
lineDelimiter | The row delimiter that is used in the file from which you want to read data. Note This parameter takes effect only when the fileFormat parameter is set to text. | No | No default value |
compress | The format in which files are compressed. By default, this parameter is left empty, which indicates that files are not compressed. The following compression formats are supported: GZIP, BZIP2, and ZIP. | No | No default value |
encoding | The encoding format of the file from which you want to read data. | No | utf-8 |
nullFormat | The string that represents a null pointer. No standard strings can represent a null pointer in TXT files. You can use this parameter to define a string that represents a null pointer. Examples:
| No | No default value |
skipHeader | Specifies whether to skip the headers in a CSV-like file if the file has headers. Valid values:
The skipHeader parameter is unavailable for compressed files. Common file compression formats are GZIP, BZIP2, and ZIP. | No | false |
connectTimeoutSeconds (advanced parameter, available only in the code editor) | The timeout period for HTTP requests. Unit: seconds. If the specified timeout period is exceeded, the task fails. | No | 60 |
socketTimeoutSeconds (advanced parameter, available only in the code editor) | The timeout period for HTTP responses. Unit: seconds. If the interval between two packets is greater than the specified timeout period, the task fails. | No | 3600 |
References
For more information about the supported data sources, see Supported data source types and synchronization operations.
For more information about how to manage permissions on a data source, see RAM authorization mode.