All Products
Search
Document Center

DataWorks:CreateDIJob

Last Updated:Oct 17, 2024

Creates a synchronization task of a new version in Data Integration. The following types of synchronization tasks are supported: real-time synchronization of all data in a MySQL database to Hologres and batch synchronization of all data in a MySQL database to Hive.

Debugging

You can run this interface directly in OpenAPI Explorer, saving you the trouble of calculating signatures. After running successfully, OpenAPI Explorer can automatically generate SDK code samples.

Authorization information

There is currently no authorization information disclosed in the API.

Request parameters

ParameterTypeRequiredDescriptionExample
ProjectIdlongNo

The DataWorks workspace ID. You can call the ListProjects operation to obtain the ID.

10000
JobNamestringNo

The name of the synchronization task.

mysql_to_holo_sync_8772
DescriptionstringNo

The description of the synchronization task.

MigrationTypestringNo

The synchronization type. Valid values:

  • FullAndRealtimeIncremental (one-time full synchronization and real-time incremental synchronization)
  • RealtimeIncremental (real-time incremental synchronization)
  • Full (full synchronization)
  • OfflineIncremental (batch incremental synchronization)
  • FullAndOfflineIncremental (one-time full synchronization and batch incremental synchronization)
FullAndRealtimeIncremental
SourceDataSourceTypestringNo

The type of the source. Set the value to MySQL.

MySQL
DestinationDataSourceTypestringNo

The type of the destination. Valid values: Hologres and Hive.

Hologres
SourceDataSourceSettingsarray<object>No

The settings of the source. Only a single source is supported.

objectNo
DataSourceNamestringNo

The name of the data source.

mysql_datasource_1
DataSourcePropertiesobjectNo

The properties of the data source.

stringNo

The properties of the source. The properties of a MySQL data source include TimeZone and Encoding.

TimeZone
DestinationDataSourceSettingsarray<object>No

The settings of the destination. Only a single destination is supported.

objectNo
DataSourceNamestringNo

The name of the data source.

holo_datasource_1
DataSourcePropertiesobjectNo

The properties of the data source.

stringNo

The properties of the destination. No properties can be configured for a Hologres data source.

TimeZone
ResourceSettingsobjectNo

The resource settings.

OfflineResourceSettingsobjectNo

The resource used for batch synchronization.

ResourceGroupIdentifierstringNo

The identifier of the resource group for Data Integration used for batch synchronization.

S_res_group_111_222
RealtimeResourceSettingsobjectNo

The resource used for real-time synchronization.

ResourceGroupIdentifierstringNo

The identifier of the resource group for Data Integration used for real-time synchronization.

S_res_group_111_222
TransformationRulesarray<object>No

The list of transformation rules for objects involved in the synchronization task. Each entry in the list defines a transformation rule.

objectNo
RuleNamestringNo

The name of the rule. If the values of the RuleActionType parameter and the RuleTargetType parameter are the same for multiple transformation rules, you must make sure that the transformation rule names are unique.

rename_rule_1
RuleActionTypestringNo

The type of the action. Valid values:

  • DefinePrimaryKey
  • Rename
  • AddColumn
  • HandleDml
  • DefineIncrementalCondition
  • DefineCycleScheduleSettings
  • DefineRuntimeSettings
  • DefinePartitionKey
Rename
RuleTargetTypestringNo

The type of the object on which you want to perform the action. Valid values:

  • Table
  • Schema
Table
RuleExpressionstringNo

The expression of the rule. An expression must be a JSON string.

Example of a renaming rule: {"expression":"${srcDatasourceName}_${srcDatabaseName}_0922","variables":[{"variableName":"srcDatabaseName","variableRules":[{"from":"fromdb","to":"todb"}]}]}

  • expression: the expression of the renaming rule. You can use the following variables in an expression: ${srcDatasourceName}, ${srcDatabaseName}, and ${srcTableName}. ${srcDatasourceName} indicates the name of the source. ${srcDatabaseName} indicates the name of a source database. ${srcTableName} indicates the name of a source table.
  • variables: the generation rule for a variable used in the expression of the renaming rule. The default value of the specified variable is the original value of the object indicated by the variable. You can define a group of string replacement rules to change the original values based on your business requirements. variableName: the name of the variable. Do not enclose the variable name in ${}. variableRules: the string replacement rules for variables. The system runs the string replacement rules in sequence for string replacement. from specifies the original string. to specifies the new string.

Example of a rule used to add a specific field to the destination and assign a value to the field: {"columns":[{"columnName":"my_add_column","columnValueType":"Constant","columnValue":"123"}]}

  • If you do not configure such a rule, no fields are added to the destination and no values are assigned by default.
  • columnName: the name of the field that you want to add.
  • columnValueType: the type of the value of the field. Valid values: Constant and Variable.
  • columnValue: the value of the field that you want to add. If you set the valueType parameter to Constant, set the columnValue parameter to a custom constant of the STRING type. If you set the valueType parameter to Variable, set the columnValue to a built-in variable. The following built-in variables are supported: EXECUTE_TIME (LONG data type), DB_NAME_SRC (STRING data type), DATASOURCE_NAME_SRC (STRING data type), TABLE_NAME_SRC (STRING data type), DB_NAME_DEST (STRING data type), DATASOURCE_NAME_DEST (STRING data type), TABLE_NAME_DEST (STRING data type), and DB_NAME_SRC_TRANSED (STRING data type). EXECUTE_TIME specifies the execution time. DB_NAME_SRC specifies the name of a source database. DATASOURCE_NAME_SRC specifies the name of the source. TABLE_NAME_SRC specifies the name of a source table. DB_NAME_DEST specifies the name of a destination database. DATASOURCE_NAME_DEST specifies the name of the destination. TABLE_NAME_DEST specifies the name of a destination table. DB_NAME_SRC_TRANSED specifies the database name obtained after a transformation.

Example of a rule used to specify primary key fields for a destination table: {"columns":["ukcolumn1","ukcolumn2"]}

  • If you do not configure such a rule, the primary key fields in the mapped source table are used for the destination table by default.
  • If the destination table is an existing table, Data Integration does not modify the schema of the destination table. If the specified primary key fields do not exist in the destination table, an error is reported when the synchronization task starts to run.
  • If the destination table is automatically created by the system, Data Integration automatically creates the schema of the destination table. The schema contains the primary key fields that you specify. If the specified primary key fields do not exist in the destination table, an error is reported when the synchronization task starts to run.

Example of a rule used to process DML messages: {"dmlPolicies":[{"dmlType":"Delete","dmlAction":"Filter","filterCondition":"id > 1"}]}

  • If you do not configure such a rule, the default processing policy for messages generated for insert, update, and delete operations is Normal.
  • dmlType: the DML operation. Valid values: Insert, Update, and Delete.
  • dmlAction: the processing policy for DML messages. Valid values: Normal, Ignore, Filter, and LogicalDelete. Filter indicates conditional processing. You can set the dmlAction parameter to Filter only when the dmlType parameter is set to Update or Delete.
  • filterCondition: the condition used to filter DML messages. This parameter is required only when the dmlAction parameter is set to Filter.
{"expression":"${srcDatasoureName}_${srcDatabaseName}"}
TableMappingsarray<object>No

The list of mappings between rules used to select synchronization objects in the source and transformation rules applied to the selected synchronization objects. Each entry in the list displays a mapping between a rule used to select synchronization objects and a transformation rule applied to the selected synchronization objects.

objectNo
SourceObjectSelectionRulesarray<object>No

The rule used to select synchronization objects in the source. You can configure multiple rules.

objectNo
ObjectTypestringNo

The type of the object. Valid values:

  • Table
  • Database
Table
ExpressionstringNo

The expression.

mysql_table_1
TransformationRulesarray<object>No

The transformation rules applied to the selected synchronization objects.

objectNo
RuleNamestringNo

The name of the rule. If the values of the RuleActionType parameter and the RuleTargetType parameter are the same for multiple transformation rules, you must make sure that the transformation rule names are unique.

rename_rule_1
RuleActionTypestringNo

The type of the action. Valid values:

  • DefinePrimaryKey
  • Rename
  • AddColumn
  • HandleDml
  • DefineIncrementalCondition
  • DefineCycleScheduleSettings
  • DefineRuntimeSettings
  • DefinePartitionKey
Rename
RuleTargetTypestringNo

The type of the object on which you want to perform the action. Valid values:

  • Table
  • Schema
Table
JobSettingsobjectNo

The settings for the dimension of the synchronization task. The settings include processing policies for DDL messages, policies for data type mappings between source fields and destination fields, and runtime parameters of the synchronization task.

DdlHandlingSettingsarray<object>No

The processing settings for DDL messages.

objectNo
TypestringNo

The type of the DDL operation. Valid values:

  • RenameColumn
  • ModifyColumn
  • CreateTable
  • TruncateTable
  • DropTable
  • DropColumn
  • AddColumn
AddColumn
ActionstringNo

The processing policy for DDL messages. Valid values:

  • Ignore: ignores a DDL message.
  • Critical: reports an error for a DDL message.
  • Normal: normally processes a DDL message.
Critical
ColumnDataTypeSettingsarray<object>No

The settings for data type mappings between source fields and destination fields. The value of this parameter must be an array.

objectNo
SourceDataTypestringNo

The data type of a source field.

bigint
DestinationDataTypestringNo

The data type of a destination field.

text
RuntimeSettingsarray<object>No

The runtime settings. The value of this parameter must be an array.

objectNo
NamestringNo

The name of the configuration item. Valid values:

  • runtime.offline.speed.limit.mb: indicates the maximum transmission rate that is allowed for a batch synchronization task. This configuration item takes effect only when runtime.offline.speed.limit.enable is set to true.
  • runtime.offline.speed.limit.enable: indicates whether throttling is enabled for a batch synchronization task.
  • dst.offline.connection.max: indicates the maximum number of connections that are allowed for writing data to the destination of a batch synchronization task.
  • runtime.offline.concurrent: indicates the maximum number of parallel threads that are allowed for a batch synchronization task.
  • dst.realtime.connection.max: indicates the maximum number of connections that are allowed for writing data to the destination of a real-time synchronization task.
  • runtime.enable.auto.create.schema: indicates whether schemas are automatically created in the destination of a synchronization task.
  • src.offline.datasource.max.connection: indicates the maximum number of connections that are allowed for reading data from the source of a batch synchronization task.
  • runtime.realtime.concurrent: indicates the maximum number of parallel threads that are allowed for a real-time synchronization task.
runtime.offline.concurrent
ValuestringNo

The value of the configuration item.

1
CycleScheduleSettingsobjectNo

The settings for periodic scheduling.

CycleMigrationTypestringNo

The synchronization type that requires periodic scheduling. Valid values:

  • Full: full synchronization
  • OfflineIncremental: batch incremental synchronization
Full
ScheduleParametersstringNo

The scheduling parameters.

bizdate=$bizdate
ChannelSettingsstringNo

The channel control settings for the synchronization task. The value of this parameter must be a JSON string.

{"structInfo":"MANAGED","storageType":"TEXTFILE","writeMode":"APPEND","partitionColumns":[{"columnName":"pt","columnType":"STRING","comment":""}],"fieldDelimiter":""}
ImportRuleSettingsobjectNo

The import settings for the synchronization task.

SourcestringNo

The import source of the task. Set the value to Datastudio, which indicates synchronization tasks created in DataStudio.

Datastudio
FileIdstringNo

The ID of the task to be imported.

10000
SystemDebugstringNo

Specifies whether to perform system debugging. Valid values: true and false. Default value: false.

false

Response parameters

ParameterTypeDescriptionExample
object

The response parameters.

RequestIdstring

The request ID. You can use the request ID to query logs and troubleshoot issues.

4F6AB6B3-41FB-5EBB-AFB2-0C98D49DA2BB
DIJobIdlong

The synchronization task ID.

11792

Examples

Sample success responses

JSONformat

{
  "RequestId": "4F6AB6B3-41FB-5EBB-AFB2-0C98D49DA2BB",
  "DIJobId": 11792
}

Error codes

HTTP status codeError codeError message
429Throttling.ApiThe request for this resource has exceeded your available limit.
429Throttling.SystemThe DataWorks system is busy. Try again later.
429Throttling.UserYour request is too frequent. Try again later.
500InternalError.SystemAn internal system error occurred. Try again later.
500InternalError.UserId.MissingAn internal system error occurred. Try again later.

For a list of error codes, visit the Service error codes.

Change history

Change timeSummary of changesOperation
2024-06-04The Error code has changedView Change Details
2024-02-28The Error code has changedView Change Details
2024-01-18The Error code has changedView Change Details