DataWorks provides OpenSearch Writer for you to write data to OpenSearch data sources. This topic describes the capabilities of writing data to OpenSearch data sources in offline mode.
Supported OpenSearch versions
OpenSearch V3 uses a second-party package, with POM of com.aliyun.opensearch aliyun-sdk-opensearch 2.1.3.
To use OpenSearch Writer, you must install JDK 1.6-32 or later. You can run the
java-version
command to view the JDK version.
Limits
OpenSearch Writer supports only exclusive resource groups for Data Integration, but not custom resource groups for Data Integration.
The columns in OpenSearch are unordered. OpenSearch Writer writes data in strict accordance with the order of the specified columns. If the number of specified columns is less than that in OpenSearch, excess columns in OpenSearch are set to the default value or null.
For example, an OpenSearch table contains columns a, b, and c, and you want to write data to columns b and c. You can set the column parameter to ["c","b"]. In this case, OpenSearch Writer imports the first and second columns of the source data that is obtained from a reader to columns c and b in the OpenSearch table. Column a in the OpenSearch table is set to the default value or null.
You can use only the code editor to configure a batch synchronization node to write data to OpenSearch data sources.
Data type mappings
OpenSearch Writer supports most OpenSearch data types. Make sure that the data types of your database are supported. The following table lists the data type mappings based on which OpenSearch Writer converts data types.
Category | OpenSearch data type |
Integer | INT |
Floating point | DOUBLE and FLOAT |
String | TEXT, LITERAL, and SHORT_TEXT |
Date and time | INT |
Boolean | LITERAL |
Develop a data synchronization node
For more information about the configuration procedure, see Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization node, see Appendix: Code and parameters.
Additional information
Handling column configuration errors
To prevent data loss caused by redundant columns and ensure high data reliability, OpenSearch Writer returns an error if the number of columns that are to be written is more than that in the destination table. For example, an OpenSearch table contains columns a, b, and c. If more than three columns need to be written to the table, OpenSearch Writer returns an error.
Table configuration
OpenSearch Writer can write data to only one table at a time.
Node rerunning
After a node is rerun, data is overwritten based on IDs. Therefore, the data written to OpenSearch must contain an ID column. An ID is a unique identifier of a row in OpenSearch. The existing data that has the same ID as the new data is overwritten.
Appendix: Code and parameters
Appendix: Configure a batch synchronization node by using the code editor
If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader and writer of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader and writer in the code editor.
Code for OpenSearch Writer
{
"type": "job",
"version": "1.0",
"configuration": {
"reader": {},
"writer": {
"plugin": "opensearch",
"parameter": {
"accessId": "*********",
"accessKey": "********",
"host": "http://yyyy.aliyuncs.com",
"indexName": "datax_xxx",
"table": "datax_yyy",
"column": [
"appkey",
"id",
"title",
"gmt_create",
"pic_default"
],
"batchSize": 500,
"writeMode": add,
"version":"v2",
"ignoreWriteError": false
}
}
}
}
Parameters in code for OpenSearch Writer
Parameter | Description | Required | Default value |
accessId | The AccessKey ID of the account that you use to connect to OpenSearch. | Yes | No default value |
accessKey | The AccessKey secret of the account that you use to connect to OpenSearch. | Yes | No default value |
host | The endpoint of OpenSearch. You can obtain the endpoint in the Alibaba Cloud Management Console. | Yes | No default value |
indexName | The name of the OpenSearch project. | Yes | No default value |
table | The name of the table to which you want to write data. You can specify only one table because Data Integration cannot import data to multiple tables at a time. | Yes | No default value |
column | The names of the columns to which you want to write data. If you want to write data to all the columns in the destination table, set this parameter to an asterisk (*), such as OpenSearch Writer can filter columns and change the order of columns. For example, an OpenSearch table contains three columns: a, b, and c. If you want to write data only to columns c and b, you can set the column parameter to | Yes | No default value |
batchSize | The number of data records to write at a time. OpenSearch Writer writes multiple data records to OpenSearch at a time. OpenSearch provides the data query feature. In most cases, the transactions per second (TPS) of OpenSearch is not high. Set this parameter based on the resources available for the account that is used to connect to OpenSearch. In most cases, the size of a data record must be less than 1 MB, and the total size of the data records to write at a time must be less than 2 MB. | Required only for writing data to a partitioned table | 300 |
writeMode | The write mode. To ensure the idempotence of write operations, set this parameter to add/update.
| Yes | No default value |
ignoreWriteError | Specifies whether to ignore the write operations that fail. Example: | No | false |
version | The version of OpenSearch, such as | No | v2 |