All Products
Search
Document Center

DataWorks:Add and manage data sources in Data Integration

Last Updated:Nov 18, 2024

DataWorks allows you to use different types of data sources as the sources and destinations of synchronization tasks. You can add data sources in Data Integration and use the data sources when you configure synchronization tasks. This topic describes how to add data sources in Data Integration.

Permission management

Only a workspace member to which the O&M or Workspace Administrator role is assigned and a RAM user to which the AliyunDataWorksFullAccess or AdministratorAccess policy is attached can add data sources. For information about the authorization, see Manage permissions on workspace-level services and Grant permissions to a RAM user.

In addition to the preceding permissions, other permissions may also be required for adding specific types of data sources. You can perform the authorization based on the instructions displayed in the DataWorks console.

Supported data source types

For information about data source types that are supported by DataWorks Data Integration, see Supported data source types and synchronization operations.

Note

The parameters that you must configure when you add data sources in Data Integration vary based on the data source type. You can view the parameters that you must configure in the DataWorks console.

Add data sources in Data Integration

  1. Go to the Data Integration page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Integration. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.

  2. In the left-side navigation pane of the Data Integration page, click Data source to go to the Data Sources page.

  3. On the Data Sources page, click Add Data Source or Batch Add Data Sources based on your business requirements.

    Add a single data source

    1. Click Add Data Source. In the Add Data Source dialog box, click the desired data source type. In the dialog box that appears, configure the parameters to add a data source of the selected type. The parameters that you must configure when you add different types of data sources vary. You can view the infotip of each parameter on the configuration page of the related data source.

    2. Optional. Test network connectivity between the data source and a resource group.

      In the Connection Configuration section of the dialog box, find the resource group that is associated with the workspace and click Test Network Connectivity in the Connection Status column.

      Note

      For more information about resource groups, see Overview.

      • If Connected is displayed in the Connection Status column, click Complete.

      • If Connection failed is displayed in the Connection Status column, the resource group cannot be connected to the data source. In this case, tasks that use the data source cannot be run.

        You can click Self-service Troubleshoot to troubleshoot connectivity issues in the Network Connectivity Diagnostic Tool panel. If the connectivity diagnostics tool does not provide a solution, check the parameters that you configure, such as the account, password, and connection address, and make sure that the IP address of the resource group is added to the IP address whitelist of the data source. For more information, see the topics in the Network connectivity directory.

    Add multiple data sources at a time

    Click Batch Add Data Sources and perform the following operations. You can add multiple data sources only of the following data source types at a time: Hive, MySQL, PolarDB, SQL Server, and Oracle.

    1. In the Batch Add Data Sources dialog box, select the desired data source type and download the configuration template for this data source type.

      The information that you must configure in the template varies based on the value of the Data Source Type parameter. You can set the Data Source Type parameter to Connection Mode or Instance Mode. You can view the information that you must configure in the DataWorks console.

    2. Configure data source information in the template.

    3. After the data source information is configured, upload the template. Then, the system adds the data sources to DataWorks at a time based on the information in the template.

      When the system adds the data sources, you can view the progress and details in the Batch Add Data Sources dialog box. If specific data sources fail to be added, you can troubleshoot the issue based on the error message.

Note
  • DataWorks allows you to add a data source in connection string mode or Alibaba Cloud instance mode. You can select a mode based on your business requirements. The parameters that you must configure vary based on the mode that you select.

    If you add a data source in connection string mode, DataWorks parses the JDBC URL of the data source. If the JDBC URL contains parameters that are not supported by DataWorks, DataWorks automatically removes the parameters. If you want to retain the unsupported parameters in the JDBC URL, submit a ticket to contact technical personnel.

  • You can configure different data source information for the development environment and production environment by using the same data source name. Data source configurations in different environments are independent of each other.

  • For information about data source types that are supported by Data Integration and how to add these types of data sources in Data Integration, see the topics in the Data sources directory.

Manage data sources added in Data Integration

On the Data Sources page, you can configure the Data Source Type and Data Source Name parameters to filter data sources. You can also perform the following operations on data sources.

image

  • Modify Data Source: You can modify the configuration information of a data source based on your business requirements. You cannot change the name or environment of a data source.

  • Delete Data Source: You can delete a data source that is no longer required. If you want to delete a data source, take note of the following items:

    • You must check whether the data source is being used by synchronization tasks. If the data source is being used by synchronization tasks and you delete the data source, the synchronization tasks fail.

    • If you grant the permissions on the data source to a member in another workspace and you delete the data source, the tasks that use the data source in the workspace fail.

  • Clone Data Source: You can use the cloning feature to quickly generate a new data source whose configuration information is the same as an existing data source.

    Note

    You must specify a name that is different from the name of the existing data source for the new data source.

  • Permission Management: You can use the permission management feature to grant permissions on a data source in the current workspace to a member in another workspace. After the permissions are granted to the member, the member can view and use the data source but cannot modify the data source. For more information, see Manage permissions on data sources.

    Note

    If you grant permissions on a data source to a workspace, all members in the workspace can view and use the data source.

Appendix: Description for adding data sources to workspaces in different modes

In a workspace in standard mode, the same data source has two different sets of configurations in the development environment and production environment. The configurations correspond to two databases or data warehouses at the underlying layer. You can configure different data source information for different environments. This way, the data source that is used for testing and the data source that is used for task scheduling in the production environment can be isolated, and data security in the production environment can be ensured. For example, if you specify different databases for the development environment and production environment when you add a data source, a batch synchronization task that uses the data source accesses different databases when you run the task. This way, the data in the development environment and the data in the production environment are isolated.

Workspace mode

Add a data source in Data Integration

Add a data source in Management Center

Workspace in standard mode

You can add a data source only for the production environment.

You can add a data source for the development environment and production environment at the same time.

Workspace in basic mode

A workspace in basic mode provides only the production environment. Operations of adding a data source in Data Integration and operations of adding a data source in Management Center are the same.

Note

If you want to add a data source in the development environment of a workspace that is in the standard mode, go to Management Center to perform operations. For more information, see Add and manage data sources.