All Products
Search
Document Center

DataWorks:Add a ClickHouse data source

Last Updated:Dec 03, 2024

Before you can develop and manage ClickHouse tasks in DataWorks, you must add a ClickHouse cluster to the required DataWorks workspace as a data source. This way, you can use the ClickHouse data source in different services of DataWorks and perform operations such as data synchronization, data development, and data analysis based on the data source.

Prerequisites

  • A ClickHouse cluster is created. For more information, see Create an ApsaraDB for ClickHouse cluster.

    Note

    We recommend that you create a ClickHouse cluster in the same region as the workspace to which you want to add a ClickHouse data source. If the regions are different, you can add only a cross-region ClickHouse data source to the workspace. The cross-region ClickHouse data source cannot be associated with DataStudio. As a result, the ClickHouse data source cannot be used for computing tasks in DataStudio or Operation Center. The ClickHouse data source can be used only for data synchronization.

  • The required resource group is purchased and configured. A ClickHouse data source supports only exclusive resource groups for Data Integration, exclusive resource groups for scheduling, and exclusive resource groups for DataService Studio.

    After the ClickHouse data source is added, you can use the data source in scenarios such as data synchronization, development and scheduling of computing tasks, and generation of DataService Studio APIs. In these scenarios, a resource group for Data Integration, a resource group for scheduling, and a resource group for DataService Studio of DataWorks are separately required. You must purchase and configure the required resource group based on the use scenario of the ClickHouse data source and establish a network connection between the data source and resource group in advance. For information about resource groups provided by DataWorks, see Overview.

  • A DataWorks workspace is created, or the account that you use is added to the desired workspace as a member.

    You must add the ClickHouse cluster to the workspace as a data source. This way, you can use the data source to perform data development operations in the workspace. In addition, you must associate the purchased resource group with the workspace and establish a network connection between the resource group and the data source. For information about how to create a workspace, see Create and manage workspaces.

    Note

    You can add the same ClickHouse cluster to multiple workspaces as a data source.

Limits

Preparations: Permission description and configuration

If you want to add a data source as a RAM user or by using a RAM role, make sure that the RAM user or RAM role meets one of the following requirements:

Go to the Data Sources page

  1. Go to the Data Sources page.

    1. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose More > Management Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.

    2. In the left-side navigation pane of the SettingCenter page, choose Data Sources > Data Sources.

  2. On the Data Sources page, click Add Data Source. In the Add Data Source dialog box, click ClickHouse. On the page that appears, configure the parameters to add a ClickHouse data source.

    You can also go to the Data Sources page in Data Integration to add an AnalyticDB for PostgreSQL data source. You can add a data source only to the production environment on the Data Sources page in Data Integration. After the data source is added, you must manage the data source on the Data Sources page in Management Center. You can go to Data Integration to view the types of data sources that you can add in this service.

Add a data source

  1. Configure information for the ClickHouse data source.

    Configure parameters such as Data Source Name in the Basic Information section. The following table describes the parameters that you must configure.

    Note

    If you use a workspace in standard mode, you must add a data source separately in the development environment and production environment. For information about the workspace modes, see Differences between workspaces in basic mode and workspaces in standard mode.

    image.png

    Parameter

    Description

    Data Source Name

    The name of the data source in DataWorks. The name must be unique within the current tenant.

    Configuration Mode

    You can add a ClickHouse data source in connection string mode.

    JDBC Connection String Preview

    The JDBC URL that is used to connect to the ClickHouse database. The URL is automatically generated after you configure the Host Address/IP Address, Port, and Database Name parameters. You can log on to the ApsaraDB for ClickHouse console to obtain information about the JDBC URL, ClickHouse database, and the port number over which you can access the database.

    Username

    The username that you use to access the ClickHouse cluster.

    Password

    The password that you use to access the ClickHouse cluster.

    Authentication Method

    Specifies whether to enable SSL authentication for the ClickHouse cluster. If you enable SSL authentication for the ClickHouse cluster, the ClickHouse data source that is added based on the cluster cannot be used for data development or periodic task scheduling.

  2. Test the network connectivity between the data source and a resource group.

    Resource groups provided by DataWorks can be classified into resource groups for Data Integration, resource groups for scheduling, and resource groups for DataService Studio based on the use scenarios of the resource groups. For more information about resource groups, see Overview.

    You can find the resource group that you want to use in the Connection Configuration section based on the scenario in which you want to use the data source, and then test the network connectivity between the resource group and the data source. If the network connectivity test fails, tasks that use the data source cannot be run.

What to do next

After the data source is added, you can perform the following operations based on your business requirements:

  • Develop and schedule computing tasks:

    DataWorks DataStudio and Operation Center provide the capabilities of developing and scheduling ClickHouse tasks. If you want to develop ClickHouse tasks based on the ClickHouse data source or periodically schedule ClickHouse tasks, you must go to the DataStudio page in the DataWorks console and associate the ClickHouse data source with DataStudio.

    Note

    You can associate a ClickHouse data source with DataStudio only if the ClickHouse cluster based on which the data source is added resides in the same region and belongs to the same Alibaba Cloud account as the workspace to which the data source is added.

  • Perform data synchronization:

    DataWorks Data Integration provides ClickHouse Reader and ClickHouse Writer for you to read data from and write data to the ClickHouse data source. You can configure a batch synchronization task for the ClickHouse data source to perform data synchronization.

  • Manage the data source: You can go to the Data Sources page in Management Center to perform management operations on the data source. For example, you can modify or remove the data source.