Before you can develop and manage ClickHouse tasks in DataWorks, you must add a ClickHouse cluster to the required DataWorks workspace as a data source. This way, you can use the ClickHouse data source in different services of DataWorks and perform operations such as data synchronization, data development, and data analysis based on the data source.
Prerequisites
A ClickHouse cluster is created. For more information, see Create an ApsaraDB for ClickHouse cluster.
NoteWe recommend that you create a ClickHouse cluster in the same region as the workspace to which you want to add a ClickHouse data source. If the regions are different, you can add only a cross-region ClickHouse data source to the workspace. The cross-region ClickHouse data source cannot be associated with DataStudio. As a result, the ClickHouse data source cannot be used for computing tasks in DataStudio or Operation Center. The ClickHouse data source can be used only for data synchronization.
The required resource group is purchased and configured. A ClickHouse data source supports only exclusive resource groups for Data Integration, exclusive resource groups for scheduling, and exclusive resource groups for DataService Studio.
After the ClickHouse data source is added, you can use the data source in scenarios such as data synchronization, development and scheduling of computing tasks, and generation of DataService Studio APIs. In these scenarios, a resource group for Data Integration, a resource group for scheduling, and a resource group for DataService Studio of DataWorks are separately required. You must purchase and configure the required resource group based on the use scenario of the ClickHouse data source and establish a network connection between the data source and resource group in advance. For information about resource groups provided by DataWorks and how to select a resource group, see Overview.
A DataWorks workspace is created, or the account that you use is added to the desired workspace as a member.
You must add the ClickHouse cluster to the workspace as a data source. This way, you can use the data source to perform data development operations in the workspace. In addition, you must associate the purchased resource group with the workspace and establish a network connection between the resource group and data source. For information about how to create a workspace, see Create and manage workspaces.
NoteYou can add the same ClickHouse cluster to multiple workspaces as a data source.
Limits
If you enable SSL authentication for a ClickHouse data source, you cannot use the data source for data development or auto triggered tasks.
You can add a ClickHouse data source only in connection string mode.
You can use only an exclusive resource group for Data Integration or exclusive resource group for scheduling to run a ClickHouse task. You can use only an exclusive resource group for DataService Studio to create DataService Studio APIs based on the data source. For more information, see Create and use an exclusive resource group for Data Integration, Create and use an exclusive resource group for scheduling, or Create and use an exclusive resource group for DataService Studio.
Preparations: Permission description and configuration
If you want to add a data source as a RAM user or by using a RAM role, make sure that the RAM user or RAM role meets one of the following requirements:
The RAM user or RAM role is added to the desired workspace as a member and is assigned the Workspace Owner, Workspace Administrator, or O&M role. For more information, see Add a RAM user to a workspace as a member and assign roles to the member.
The RAM user or RAM role is attached the AliyunDataWorksFullAccess or AdministratorAccess policy. For more information, see Grant permissions to a RAM user or Grant permissions to a RAM role.
Go to the Data Sources page
Go to the Data Sources page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.
In the left-side navigation pane of the SettingCenter page, choose
.
On the Data Sources page, click Add Data Source. In the Add Data Source dialog box, click ClickHouse. On the page that appears, configure the parameters to add a ClickHouse data source.
You can also go to the Data Sources page in Data Integration to add an AnalyticDB for PostgreSQL data source. You can add a data source only to the production environment on the Data Sources page in Data Integration. After the data source is added, you must manage the data source on the Data Sources page in Management Center. You can go to Data Integration to view the types of data sources that you can add in this service.
Add a data source
Configure information for the ClickHouse data source.
Configure parameters such as Data Source Name in the Basic Information section. The following table describes the parameters that you must configure.
NoteIf you use a workspace in standard mode, you must separately add data sources in the development environment and production environment. For information about the workspace modes, see Differences between workspaces in basic mode and workspaces in standard mode.
Parameter
Description
Data Source Name
The name of the data source in DataWorks. The name must be unique within the current tenant.
Configuration Mode
This parameter can be set to only Connection String Mode.
JDBC URL
The JDBC URL that is used to connect to the ClickHouse database. You can log on to the ApsaraDB for ClickHouse console to obtain information about the ClickHouse database and the port number over which you can access the database.
Username
The username that you use to access the ClickHouse cluster.
Password
The password that you use to access the ClickHouse cluster.
Authentication Method
Specifies whether to enable SSL authentication for the ClickHouse cluster. If you enable SSL authentication for the ClickHouse cluster, the ClickHouse data source that is added based on the cluster cannot be used for data development or periodic task scheduling.
Test the network connectivity between the data source and a resource group.
Resource groups provided by DataWorks can be classified into resource groups for Data Integration, resource groups for scheduling, and resource groups for DataService Studio based on the use scenarios of the resource groups. For more information about resource groups, see Overview.
You can find the resource group that you want to use in the Connection Configuration section based on the scenario in which you want to use the data source, and then test the network connectivity between the resource group and the data source. If the network connectivity test fails, tasks that use the data source cannot be run.
What to do next
After the data source is added, you can perform the following operations based on your business requirements:
Develop and schedule computing tasks:
DataWorks DataStudio and Operation Center provide the capabilities of developing and scheduling ClickHouse tasks. If you want to develop ClickHouse tasks based on the ClickHouse data source or periodically schedule ClickHouse tasks, you must go to the DataStudio page in the DataWorks console and associate the ClickHouse data source with DataStudio.
NoteYou can associate a ClickHouse data source with DataStudio only if the ClickHouse cluster based on which the data source is added resides in the same region and belongs to the same Alibaba Cloud account as the workspace to which the data source is added.
DataWorks Data Integration provides ClickHouse Reader and ClickHouse Writer for you to read data from and write data to the ClickHouse data source. You can configure a batch synchronization task for the ClickHouse data source to perform data synchronization.
Manage the data source: You can go to the Data Sources page in Management Center to perform management operations on the data source. For example, you can modify or remove the data source.