All Products
Search
Document Center

DataWorks:Add a data source or register a cluster to a workspace

Last Updated:Dec 05, 2024

After you create a workspace, you can add a data source to the workspace or register a cluster to the workspace based on the database, data warehouse, or cluster that you want to use. This way, you can use the data source or cluster to perform operations such as data synchronization, data analysis and development, and data scheduling. This topic describes how to make the environment preparations that you must complete in your workspace before you can develop data. The preparations include data source addition or cluster registration and association of a data source for scheduling with DataStudio. In this topic, a formal development environment is used.

Background information

In a DataWorks workspace, you can synchronize data and develop data based on data sources or clusters.

  • Data sources

    • DataWorks allows you to add various types of data sources. After you add a data source to a DataWorks workspace, you can use the data source to synchronize data in the workspace. For more information about the data sources that you can use to synchronize data, see Data source list.

    • You can use only the following types of data sources for data development: MaxCompute, Hologres, AnalyticDB for PostgreSQL, AnalyticDB for MySQL 3.0, and ClickHouse. If you want to use one of the preceding types of data sources for data development, task scheduling, and data analysis, you must associate the data source with DataStudio after you add the data source to DataWorks.

  • Clusters: DataWorks allows you to register an E-MapReduce (EMR) cluster, a Cloudera's Distribution Including Apache Hadoop (CDH) cluster, or a Cloudera Data Platform (CDP) cluster to DataWorks. After the cluster is registered, you can perform operations, such as data development, task scheduling, and data analysis, in the current workspace based on the cluster. If you want to run a data synchronization task based on a component of a cluster, you must add the component to DataWorks as a data source. For more information, see Supported data source types and synchronization operations.

For more information about data sources or clusters, see Add and manage data sources.

Prerequisites

  • A workspace is created. For more information, see Create a workspace.

  • The Alibaba Cloud services to which the required compute engines belong are activated. For more information, see the documentation on the official website of each Alibaba Cloud service.

Permissions

You can add a data source or register a cluster to DataWorks only if you have the required permissions. If you do not have the required permissions, an error message appears when you add a data source or register a cluster. You must first apply for the permissions that are specified in the error message. The required permissions vary based on the type of the compute engine.

The following figure shows the permissions that are required for adding a MaxCompute data source to DataWorks.image

Step 1: Add a data source or register a cluster

After you create a workspace, you must add a data source of a required engine type or register a cluster to the current workspace for subsequent development operations.

Add a data source

  1. Go to the Management Center page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose More > Management Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.

  2. In the left-side navigation pane of the SettingCenter page, choose Data Sources > Data Sources.

  3. On the Data Sources page, click Add Data Source to add a data source based on your business requirements.

    You can use only the following types of data sources to develop data and schedule tasks: MaxCompute, Hologres, AnalyticDB for PostgreSQL, AnalyticDB for MySQL 3.0, and ClickHouse. To add these types of data sources, refer to the topics that are listed in the following table.

    Data source type

    References

    MaxCompute

    Add a MaxCompute data source

    Hologres

    Add a Hologres data source

    AnalyticDB for PostgreSQL

    Add an AnalyticDB for PostgreSQL data source

    AnalyticDB for MySQL3.0

    Add an AnalyticDB for MySQL V3.0 data source

    ClickHouse

    Add a ClickHouse data source

    After you add a data source, you can use the data source to synchronize data. For more information, see Overview.

    If you want to use a data source for data development, data analysis, or periodic task scheduling, proceed to Step 2: Associate the data source with DataStudio.

Register a cluster

  1. Go to the Management Center page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose More > Management Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.

  2. In the left-side navigation pane of the SettingCenter page, click Cluster Management. On the Cluster Management page, click Register Cluster to register a cluster based on your business requirements. To register a cluster, refer to the topics that are listed in the following table.

    Cluster type

    References

    E-MapReduce

    Register an EMR cluster to DataWorks

    CDH/CDP

    Register a CDH or CDP cluster to DataWorks

    After you register a cluster to DataWorks, you can use the cluster to perform operations such as data development and periodic task scheduling and data analysis.

    If you want to run a data synchronization task based on a component of a cluster, you must add the component to DataWorks as a data source. For more information, see Supported data source types and synchronization operations.

Step 2: Associate the data source with DataStudio

After you add a data source to a DataWorks workspace, if you want to perform operations such as data development, data analysis, or periodic task scheduling in Operation Center in the current workspace based on the data source, you must associate the data source with DataStudio in the current workspace. For more information, see Preparations before data development: Associate a data source or a cluster with DataStudio.

Note

After you register an EMR, CDH, or CDP cluster to DataWorks, DataWorks automatically associates the cluster with DataStudio. Therefore, you can use the data source to develop tasks in the current workspace without the need to perform manual association.