All Products
Search
Document Center

OpenSearch:Use data processing plug-ins

Last Updated:Apr 09, 2024

OpenSearch allows you to import data by calling an API operation, using OpenSearch SDKs, or in the console. You can also synchronize data from an existing PolarDB database or Apsara RDS database to OpenSearch. If you use the API operation or OpenSearch SDKs to upload data, see relevant topics. In this case, you cannot use the data processing plug-ins that are described in this topic. If you use a data source to synchronize data in the cloud to OpenSearch, you must configure the information about the data source in the console. OpenSearch provides several data processing plug-ins for you to perform simple data conversion operations. You can use a data processing plug-in when you configure field mappings between OpenSearch tables and source tables. If you use the API operation to upload data, you cannot use the data processing plug-ins and must process the data by yourself before you upload it.

You can associate an OpenSearch table with multiple tables in an ApsaraDB RDS or a PolarDB data source in the case of database and table sharding. However, you can associate an OpenSearch table with only one MaxCompute source table. If you need to synchronize data from multiple MaxCompute source tables, join the tables to form one table and then upload the table.

Data processing plug-ins

To use specific search features or functions, you must configure specific field types. For example, you must use a plug-in that is described in the following table to convert fields of other types to the fields of array types. Otherwise, you cannot reference the fields.

Note: You can configure a plug-in when you configure a data source for an application rather than when you define the application schema. You can configure a plug-in only after a data source is configured.

Plug-in

Description

Example

JsonKeyValueExtractor

This plug-in extracts the specified key value from source fields in JSON format. The extracted key value is used as the name of the destination table field. Only the value of the specified key can be extracted.

The value of the title key is extracted from {"title":"the content","body":"the content"}. If the extracted value is in JSON array format, the value is converted to the field value of array types. Make sure that the type of the extracted value is consistent with that of the destination table field. Otherwise, the extracted value is lost. The preceding JSON array format refers to the JSON array format that is defined by OpenSearch. Sample field of the LITERAL_ARRAY type: {"tags":["a","b","c"]}. Sample field of the INT_ARRAY type: {"tags":[1,2,3]}.

MultiValueSpliter

This plug-in uses delimiters to divide the value of a source field into multiple values. The values are used as the array elements of the corresponding destination field. The destination fields must be of array types. If you use a non-printable character as the delimiter, you must use Unicode characters, such as \u001D, to identify the non-printable character.

The value of a source field is 1,2,3. You can enter a comma (,) when you specify the delimiter.

KeyValueExtractor

This plug-in extracts the specified key value from source fields which are key-value pairs. The extracted key value is used as the value of the destination table field. Only the value of the specified key can be extracted. Delimiters are not required.

For the fields of key1:value1,value2;key2:value3, the keys are key1 and key2, key-value pairs are separated by using semicolons (;), the key and the value in a single key-value pair are separated by using colons (:), and values are separated by using commas (,). If you use delimiters to separate the extracted value, the value is converted to the field value of array types. Make sure that the type of the extracted value is consistent with that of the destination table field. Otherwise, the extracted value is lost. If two identical keys exist, only the value of the second key is extracted.

StringCatenateExtractor

This plug-in concatenates the values of specified fields into a string in a specific sequence. This plug-in cannot concatenate fields of the INT type. We recommend that you use fields of the LITERAL type. Separate multiple fields with commas (,). The fields must be from destination table fields.

You can use the plug-in to concatenate the field1 and field2 fields into a new field by using an underscore (_). You can also obtain the name of the current table from the system variable $table. $table is displayed only when a table-sharding wildcard is configured.

HTMLTagRemover

This plug-in removes HTML tags from the value of a source field. Then, the value of the current field is replaced by the value without HTML tags.

The value of a source field is < div id="copyright">OpenSearch< /div>. If you use the plug-in to remove HTML tags, the value of the field is parsed as OpenSearch.

Note

More informations, see Configure the MultiValueSpliter plug-in.