All Products
Search
Document Center

DataWorks:Add a Hologres data source

Last Updated:Nov 14, 2024

Before you can develop and manage Hologres tasks in DataWorks, you must add a Hologres instance to the desired DataWorks workspace as a data source. This way, you can use the Hologres data source in different services of DataWorks and perform operations such as data synchronization, data development, and data analysis based on the Hologres data source.

Prerequisites

  • A Hologres instance is created, and a database is created for the Hologres instance. For more information, see Purchase a Hologres instance and Create a database.

    Note

    We recommend that you create a Hologres instance in the same region as the workspace to which you want to add a Hologres data source. If the regions are different, you can add only a cross-region data source to the workspace. The data source cannot be associated with DataWorks DataStudio. This indicates that the data source cannot be used to run computing tasks in DataStudio or Operation Center. The data source can be used only for data synchronization.

  • The required resource group is purchased and configured.

    After the Hologres data source is added, you can use the data source in scenarios such as data synchronization, development and scheduling of computing tasks, and generation of DataService Studio APIs. A new version of resource groups that are used for general purposes can meet the requirements of all the preceding scenarios. You must purchase an exclusive resource group based on your business requirements. For example, you can purchase an exclusive resource group for Data Integration, an exclusive resource group for scheduling, and an exclusive resource group for DataService Studio of DataWorks. We recommend that you use a general-purpose resource group. You must purchase and configure the required resource group based on the use scenario of the Hologres data source and establish a network connection between the data source and resource group in advance. For more information about resource groups, see Resource group management.

    Note

    A Hologres data source supports only general-purpose resource groups and exclusive resource groups.

  • A DataWorks workspace is created, or the account that you use is added to the desired workspace as a member.

    You must add the desired Hologres instance to the workspace as a data source. This way, you can use the data source to perform data development operations in the workspace. In addition, you must associate the purchased resource group with the workspace and establish a network connection between the resource group and data source. For information about how to create a workspace, see Create and manage workspaces.

    Note

    You can add the same Hologres instance to multiple workspaces as a data source.

Limits

  • A Hologres data source can be associated with DataWorks DataStudio only if the Hologres data source meets the following conditions: The Hologres instance based on which the Hologres data source is added resides in the same region and belongs to the same Alibaba Cloud account as the workspace, and SSL encryption is not enabled for the Hologres data source. This way, the Hologres data source can be used for data development or task scheduling.

    Important

    If transmission encryption is enabled for the Hologres instance that you want to add as a data source, you can enable SSL authentication when you add the Hologres instance as a data source. However, Hologres data sources for which SSL authentication is enabled cannot be used for data development or task scheduling.

  • If you add a Hologres instance that does not belong to the current Alibaba Cloud account to a workspace within the current Alibaba Cloud account as a data source, you can use only a RAM role to access the related Hologres instance. Hologres data sources that are added across accounts cannot be used for data development or task scheduling.

  • You can use only an exclusive resource group or a general-purpose resource group to run Hologres tasks that are configured for a Hologres data source. For more information, see Create and use a general-purpose resource group and Use old-version resource groups.

Preparations: Permission description and configuration

  1. Configure the required permissions at the DataWorks side.

    Before you add a Hologres data source to DataWorks, you must make sure that your Alibaba Cloud account has the permissions required to add a data source. The account must meet one of the following conditions:

    Note

    If you want to add a data source as a RAM user, DataWorks determines whether the account must have other permissions based on the value of the Default Access Identity parameter.

    • If you set the Default Access Identity parameter to Executor, no additional permissions are required.

    • If you set the Default Access Identity parameter to Alibaba Cloud Account, Alibaba Cloud RAM Role, or Alibaba Cloud RAM User, you must attach the AdministratorAccess policy to the RAM user.

  2. Configure the required permissions at the Hologres side.

    After a Hologres data source is added, you must use the default access identity that is specified for the data source to access the related Hologres instance. You must make sure that the Alibaba Cloud account that corresponds to the default access identity has operation permissions on the Hologres instance. For information about permissions on a Hologres instance and how to grant a user the permissions on a Hologres instance, see Overview.

  3. Optional. Configure the required permissions if you want to add a Hologres data source across accounts.

    If you add a Hologres data source across Alibaba Cloud accounts, you can use only a RAM role to access the related Hologres instance, and you must grant the required permissions to the RAM role.

    • Example for adding a Hologres data source across Alibaba Cloud accounts:

      In this example, Alibaba Cloud Account A is used to log on to the DataWorks console and add a Hologres instance that belongs to Alibaba Cloud Account B to DataWorks as a data source.

      • Alibaba Cloud Account A: DataWorks is activated within Alibaba Cloud Account A and Alibaba Cloud Account A needs to access a Hologres instance within Alibaba Cloud Account B.

      • Alibaba Cloud Account B: A Hologres instance is created within Alibaba Cloud Account B and a Hologres database is created for the instance.

    • Requirements on a RAM role of Alibaba Cloud Account B and permission configuration for the RAM role:

      1. You must create a RAM role within Alibaba Cloud Account B and grant the RAM role the permissions to access a specified Hologres instance. You must add Alibaba Cloud Account A as the trusted cloud account of the RAM role to allow Alibaba Cloud Account A to assume the RAM role. For more information, see RAM role authorization mode.

      2. You must modify the trust policy of the RAM role to allow Alibaba Cloud Account A to assume the RAM role. For more information, see Edit the trust policy of a RAM role.

        The following code shows the document of the trust policy:

        {
            "Version": "1",
            "Statement": [
                {
                    "Action": [
                        "sts:AssumeRole",
                        "hologram:GetInstance",
                        "hologram:ListInstances",
                        "hologram:ListWarehouses"
                    ],
                    "Effect": "Allow",
                    "Principal": {
                        "Service": [
                            "ID of Alibaba Cloud Account A@engine.dataworks.aliyuncs.com"
                        ]
                    }
                }
            ]
        }

Add a data source

  1. Go to the Data Sources page.

    1. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose More > Management Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.

    2. In the left-side navigation pane of the SettingCenter page, choose Data Sources > Data Sources.

  2. On the Data Source page, click Add Data Source. In the Add Data Source dialog box, click Hologres. On the page that appears, configure the parameters.

  3. Configure information for the Hologres data source.

    Configure parameters such as Data Source Name in the Basic Information section. The following table describes the parameters that you must configure.

    Note

    If you use a workspace in standard mode, you must separately add data sources in the development environment and production environment. For information about the workspace modes, see Differences between workspaces in basic mode and workspaces in standard mode.

    Parameter

    Description

    Data Source Name

    The name of the data source in DataWorks. The name must be unique within the current tenant.

    Authentication Method

    For a new data source, the value of this parameter is fixed as Alibaba Cloud Account and Alibaba Cloud RAM Role.

    Note

    For an existing data source that is added by using an AccessKey pair, we recommend that you change the value of this parameter to Alibaba Cloud Account and Alibaba Cloud RAM Role for the data source.

    Alibaba Cloud Account

    Specifies whether the Hologres instance you want to use belongs to the current Alibaba Cloud account or another Alibaba Cloud account. Valid values:

    • Current Alibaba Cloud Account: The Hologres instance belongs to the current Alibaba Cloud account.

    • Another Alibaba Cloud Account: The Hologres instance belongs to another Alibaba Cloud account.

      Note

      If you set this parameter to Another Alibaba Cloud Account, you must add the Hologres data source across accounts. After the Hologres data source is added, you can use only a RAM role to access the related Hologres instance.

    Region

    The region in which the Hologres instance that you want to use resides.

    Note

    If the region that you selected is different from the region in which the workspace resides, you cannot associate the data source with DataWorks DataStudio. This indicates that the data source cannot be used in DataStudio or Operation Center and can be used only in Data Integration for data synchronization.

    Other parameters such as Hologres Instance and Default Access Identity

    The other parameters that you must configure vary based on the value of the Alibaba Cloud Account parameter.

    Current Alibaba Cloud Account

    • Hologres Instance and Database Name: Select the Hologres instance that you want to add as a data source from the Hologres Instance drop-down list and enter the name of the desired Hologres database in the Database Name field. You can log on to the Hologres console, and obtain the Hologres instance and database information on the details page of the Hologres instance.

    • Default Access Identity: The default access identity that is used to access the data source.

      • Development environment: The value of this parameter is fixed as Executor. Executor indicates the current logon account.

        For example, if you create and debug a Hologres task on the DataStudio page in the DataWorks console, the default access identity that is used to access Hologres is the Alibaba Cloud account used to log on to the DataWorks console.

      • Production environment: The value of this parameter can be Alibaba Cloud Account, Alibaba Cloud RAM User, or Alibaba Cloud RAM Role.

        Note

        For information about how to use a RAM role to perform operations, see (Advanced) Use a RAM role to log on to the DataWorks console and use DataWorks.

        The default access identities that are displayed in the Default Access Identity drop-down list vary based on the account that you use to add the Hologres data source.

        When a Hologres task is periodically scheduled in Operation Center, the default access identity that you selected is used to access the related Hologres instance.

    Another Alibaba Cloud Account

    Note

    If you want to add the Hologres data source across accounts, you must set this parameter to Another Alibaba Cloud Account. After the Hologres data source is added, you can use only a RAM role to access the related Hologres instance. Hologres data sources that are added across accounts cannot be used for data development or task scheduling.

    • UID Of Alibaba Cloud Account and RAM Role: Enter the UID of the Alibaba Cloud account to which the Hologres instance you want to add as a data source belongs in the UID Of Alibaba Cloud Account field, and enter the RAM role that you want to use to access the Hologres instance in the RAM Role field. You must use the RAM role to access the Hologres instance.

    • Hologres Instance and Database Name: Enter the ID of the Hologres instance that you want to add to the current workspace as a data source in the Hologres Instance field, and then enter the name of the desired Hologres database in the Database Name field. You can log on to the Hologres console, and obtain the Hologres instance and database information on the details page of the Hologres instance.

    Authentication Method and SSL Encryption

    Specifies whether to enable SSL authentication for the Hologres data source and whether to implement transmission encryption for the Hologres data source.

    If you want to set the Authentication Method parameter to SSL Authentication, you must make sure that transmission encryption is enabled for the Hologres instance you use. If transmission encryption is not enabled for the Hologres instance and you set the Authentication Method parameter to SSL Authentication, an error is reported when you access the Hologres instance.

    Important

    If you enable SSL authentication for the Hologres data source, the Hologres data source cannot be used for data development or task scheduling after it is added.

  4. Test the network connectivity between the Hologres data source and a resource group.

    Resource groups provided by DataWorks can be classified into resource groups for Data Integration, resource groups for scheduling, and resource groups for DataService Studio based on the use scenarios of the resource groups. For more information about these resource groups, see Overview.

    You can find the resource group that you want to use in the Connection Configuration section and test the network connectivity between the data source and resource group. If the network connectivity test fails, tasks that use the data source cannot be run.

What to do next

To ensure the smoothness of data development, we recommend that you read Usage notes for development of Hologres tasks in DataWorks to understand information such as the procedure of using Hologres in DataWorks, fees for data development by using Hologres, environment preparation, and permission management before you perform the related operations.

After the data source is added, you can perform the following operations based on your business requirements:

  • Develop and schedule computing tasks:

    DataWorks DataStudio and Operation Center provide the capabilities of developing and scheduling Hologres tasks. If you want to develop Hologres tasks based on the Hologres data source or periodically schedule Hologres tasks, you must go to the DataStudio page in the DataWorks console and associate the Hologres data source with DataStudio.

    Note

    You can associate a Hologres data source with DataStudio only if the Hologres instance based on which the data source is added resides in the same region and belongs to the same Alibaba Cloud account as the workspace to which the data source is added.

  • Perform data synchronization:

    DataWorks Data Integration provides Hologres Reader and Hologres Writer for you to read data from and write data to the Hologres data source. You can configure a batch or real-time synchronization task for the Hologres data source in DataStudio or configure a synchronization task for the Hologres data source in Data Integration based on your business requirements to perform data synchronization.

  • Manage the data source: You can go to the Data Source page in SettingCenter to perform management operations on the data source. For example, you can modify or delete the data source.