Configure GaussDB (DWS) output component - Dataphin - Alibaba Cloud Documentation Center

The GaussDB (DWS) output component enables data writing to a GaussDB (DWS) data source. When synchronizing data from various sources to a GaussDB (DWS) data source, it is necessary to configure the target data source after setting up the source data. This topic describes the configuration process for the GaussDB (DWS) output component.

Prerequisites

A GaussDB (DWS) data source must be created. For more information, see create GaussDB (DWS) data source.
To configure the properties of the GaussDB (DWS) output component, the account must have write-through permission for the data source. If you lack the necessary permissions, you must obtain them from the data source. For more information, see how to request data source permissions

Procedure

On the Dataphin home page, navigate to the top menu bar and select Development > Data Integration.
At the top of the integration page, select Project (Dev-Prod mode requires selecting an environment).
In the left-side navigation pane, click Batch Pipeline. Then, from the Batch Pipeline list, select the offline pipeline you want to develop to access its configuration page.
Click Component Library in the upper right corner to open the Component Library panel.
In the Component Library panel's left-side navigation pane, select Output. Locate the GaussDB (DWS) component in the output list and drag it onto the canvas.
Connect the target input, transform, or flow component to the GaussDB (DWS) output component by clicking and dragging the icon.
Click the icon on the GaussDB (DWS) output component card to launch the GaussDB (DWS) Output Configuration dialog box.

Within the GaussDB (DWS) Output Configuration dialog box, set the necessary parameters.

Parameter		Description
Basic Information	Step Name	This is the name of the GaussDB (DWS) output component. Dataphin automatically generates the step name, but you can modify it according to your business scenario. The naming convention is as follows: Can only contain Chinese characters, letters, underscores (_), and numbers. Can be up to 64 characters in length.
	Datasource	In the data source drop-down list, all GaussDB (DWS) type data sources are displayed, including data sources for which you have write-through permission and those for which you do not. Click the icon to copy the current data source name. For data sources without write-through permission, you can click Request after the data source to request write-through permission for the data source. For more information, see request data source permission. If you do not have a GaussDB (DWS) type data source, click Create Data Source to create a data source. For more information, see create GaussDB (DWS) data source.
	Schema (optional)	Supports selecting tables across schemas. Please select the schema where the table is located. If not specified, the schema configured in the data source is used by default.
	Table	Select the target table for output data. You can enter a table name keyword to search, or enter the exact table name and then click Exact Search. After selecting a table, the system will automatically perform table status detection. Click the icon to copy the name of the currently selected table. If there is no target table for data synchronization in the GaussDB (DWS) data source, you can use the one-click generate target table feature to quickly generate a target table. The detailed procedure is as follows: Click One-click Create Table. Dataphin will automatically match the code for creating the target table for you, including the target table name (default is the source table name), field types (initial conversion based on Dataphin fields), and other information. You can modify the SQL script for creating the target table according to your business needs, and then click Create. After the target table is successfully created, Dataphin automatically uses the newly created target table as the target table for output data. One-click create table is used to create target tables for data synchronization in the development environment and production environment. Dataphin defaults to selecting the production environment for table creation. If there is already a table with the same name and structure in the production environment, you do not need to select the production environment for table creation. Note If there is a table with the same name in the development environment or production environment, Dataphin will report an error indicating that the table already exists after clicking Create.
	Loading Policy	Supports insert and copy policies. Insert policy: Executes the GaussDB (DWS) `insert into...values...` statement to write data into GaussDB (DWS). When a primary key or unique index conflict occurs, the data row to be synchronized fails to be written into GaussDB (DWS), and the current record row becomes dirty data. We recommend that you use the insert mode. Copy policy: GaussDB (DWS) provides the copy command for mutual replication between tables and files (standard output, standard input). Data Integration supports using `copy from` to load data into a table, and actions are taken based on the conflict resolution policy when conflicts occur. We recommend that you try this policy when you encounter performance issues. You also need to configure the Conflict Resolution Policy, including Error on Conflict and Overwrite on Conflict. Important The conflict resolution policy is only effective in the Copy mode when the AnalyticDB for PostgreSQL kernel version is higher than 4.3. Please choose carefully when the kernel is lower than 4.3 or unknown to avoid task failure.
	Batch Write Data Volume (optional)	The size of the data volume written at one time. You can also set the Batch Write Count. During writing, the system will write according to the limit reached first among the two configurations. The default is 32M.
	Batch Write Count (optional)	The default is 2048 rows. During data synchronization writing, a batch writing policy is adopted, and the set parameters include Batch Write Count and Batch Write Data Volume. When the accumulated data volume reaches any of the set limits (that is, the batch write data volume or count limit is reached), the system will consider a batch of data to be full and will immediately write this batch of data to the target end at one time. We recommend setting the batch write data volume to 32MB. For the upper limit of batch insert count, you can flexibly adjust it according to the actual size of a single record, usually set to a larger value to fully utilize the advantages of batch writing. For example, if the size of a single record is about 1KB, you can set the batch insert byte size to 16MB. Considering this condition, set the batch insert count to be greater than the result of 16MB divided by the size of a single record, 1KB (that is, greater than 16384 rows). Here, it is assumed to be set to 20000 rows. After such configuration, the system will trigger the batch writing operation based on the batch insert byte size. Whenever the accumulated data volume reaches 16MB, a write action will be executed.
	Preparation Statement (optional)	The SQL script executed on the database before data import. For example, to ensure the continuous availability of the service, before the current step writes data, create the target table Target_A, execute the write to the target table Target_A, and after the current step writes data, rename the table Service_B that continuously provides services in the database to Temp_C, then rename the table Target_A to Service_B, and finally delete Temp_C.
	End Statement (optional)	The SQL script executed on the database after data import.
Field Mapping	Input Field	The input fields are displayed for you based on the output of the upstream component.
	Output Field	The output fields are displayed for you. Click Field Management to select output fields. Click the icon to move the Selected Input Fields to the Unselected Input Fields. Click the icon to move the Unselected Input Fields to the Selected Input Fields.
	Mapping	Based on the upstream input and the target table fields, you can manually select field mapping. Mapping includes Row Mapping and Name Mapping. Name Mapping: Maps fields with the same field name. Row Mapping: The field names of the source table and target table are inconsistent, but the data in the corresponding rows of the fields need to be mapped. Only fields in the same row are mapped.

To finalize the configuration of the GaussDB (DWS) output component, click Confirm.