All Products
Search
Document Center

DataWorks:Preparations before data development: Associate a data source or a cluster with DataStudio

Last Updated:Nov 13, 2024

If you want to perform data modeling or data development, or periodically schedule tasks in Operation Center in DataWorks, you must first associate your data source or cluster with DataStudio. This way, you can read data from the data source or cluster and perform data development operations.

Prerequisites

A data source or cluster of a specific type is created based on the type of tasks that you want to develop and schedule.

You can associate different types of data sources or clusters with DataStudio. For more information, see the following topics:

Limits

  • A data source or a cluster may fail to be associated with DataStudio in the following scenarios:

    • The configurations of the data sources or clusters of specific types do not support association with DataStudio. For example, you cannot associate a data source that is added by using an AccessKey pair with DataStudio. For more information about limits on the association, see the descriptions that are displayed in the DataWorks console when you associate a data source or a cluster with DataStudio.

    • The configurations in the development or production environment are missing.

    Note

    The reason why a data source or cluster cannot be associated with DataStudio varies based on the type of the data source or cluster. You can troubleshoot issues based on the reason that is displayed when you try to associate the data source or cluster with DataStudio.

  • Only the following types of data sources or clusters can be associated with DataStudio: MaxCompute, E-MapReduce (EMR), Hologres, AnalyticDB for MySQL, ClickHouse, Cloudera's Distribution Including Apache Hadoop (CDH), Cloudera Data Platform (CDP), and AnalyticDB for PostgreSQL.

  • The types and number of data sources or clusters that can be associated with DataStudio vary based on the DataWorks edition. For more information, see Feature comparison.

Associate a data source or cluster

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. In the left-side navigation pane, click Data Source. The Data Source page appears.

    If the Data Source module is not displayed in the left-side navigation pane, you must go to the Personal Settings tab and select Data Source in the DataStudio Modules section to allow the Data Source module to be displayed in the left-side navigation pane of the DataStudio page. For more information, see Configure settings in the DataStudio Modules section.

  3. Associate a data source or cluster.

    On the Data Source page, you can search for the desired data source or cluster by name and complete the association. After you associate the data source or cluster with DataStudio, you can read data from the data source or cluster based on the connection information and perform relevant data development operations.

    Note

    If data source information changes, but the data on the Data Source page is not updated in time, refresh the Data Source page to update the cached data.

    image.png

    • A data source or a cluster may fail to be associated with DataStudio in the following scenarios:

      • The configurations of the data sources or clusters of specific types do not support association with DataStudio. For example, you cannot associate a data source that is added by using an AccessKey pair with DataStudio. For more information about limits on the association, see the descriptions that are displayed in the DataWorks console when you associate a data source or a cluster with DataStudio.

      • The configurations in the development or production environment are missing.

      Note

      The reason why a data source or cluster cannot be associated with DataStudio varies based on the type of the data source or cluster. You can troubleshoot issues based on the reason that is displayed when you try to associate the data source or cluster with DataStudio.

    • Only the following types of data sources or clusters can be associated with DataStudio: MaxCompute, E-MapReduce (EMR), Hologres, AnalyticDB for MySQL, ClickHouse, Cloudera's Distribution Including Apache Hadoop (CDH), Cloudera Data Platform (CDP), and AnalyticDB for PostgreSQL.

    • The types and number of data sources or clusters that can be associated with DataStudio vary based on the DataWorks edition. For more information, see Feature comparison.

What to do next

After you associate data sources with DataStudio, you can perform the following operations based on your business requirements:

  • Develop and periodically schedule computing tasks. For more information, see DataStudio overview and Operation Center overview.

  • Perform data modeling operations that depend on data sources, such as model development and publishing. For more information, see Overview.

  • Manage data sources:

    • Specify a default data source: If multiple data sources of the same type exist, you can specify a default data source that is preferentially used in subsequent task development.

      Note

      If only one data source of a specific type exists, the data source is used as the default data source.

    • Disassociate data sources: You can disassociate a data source if the data source is no longer required for data modeling, data development, or task scheduling. After you disassociate the data source, relevant tasks may fail to be run.

    • Modify data sources: You can go to the data source management page to modify the information of a data source.