Before you can develop and manage AnalyticDB for PostgreSQL tasks in DataWorks, you must add an AnalyticDB for PostgreSQL instance to the required DataWorks workspace as a data source. This way, you can use the AnalyticDB for PostgreSQL data source in different services of DataWorks and perform operations such as data synchronization, data development, and data analysis based on the data source.
Prerequisites
An AnalyticDB for PostgreSQL instance is purchased, and an AnalyticDB for PostgreSQL database is created in the instance. For more information, see Create an instance and Manage databases.
NoteWe recommend that you create an AnalyticDB for PostgreSQL instance in the same region as the workspace to which you want to add an AnalyticDB for PostgreSQL data source. If the regions are different, you can add only a cross-region data source to the workspace. The data source cannot be associated with DataStudio for data development or periodic task scheduling. The data source can be used only for data synchronization.
The required resource group is purchased and configured.
An AnalyticDB for PostgreSQL data source supports serverless resource groups and old-version exclusive resource groups. Old-version exclusive resource groups include exclusive resource groups for Data Integration, exclusive resource groups for scheduling, and exclusive resource groups for DataService Studio. We recommend that you use serverless resource groups. After the AnalyticDB for PostgreSQL data source is added, you can use the data source in scenarios such as data synchronization, development and scheduling of computing tasks, and generation of DataService Studio APIs.
NoteYou must purchase and configure the required resource group based on the use scenario of the AnalyticDB for PostgreSQL data source and establish a network connection between the data source and resource group in advance. For information about resource groups provided by DataWorks and how to configure a resource group, see Overview.
A DataWorks workspace is created, or the account that you use is added to the desired workspace as a member.
You must add the desired AnalyticDB for PostgreSQL instance to the workspace as a data source. This way, you can use the data source to perform data development operations in the workspace. In addition, you must associate the purchased resource group with the workspace and establish a network connection between the resource group and the data source. For information about how to create a workspace, see Create and manage workspaces.
NoteYou can add the same AnalyticDB for PostgreSQL instance to multiple workspaces as a data source.
Limits
An AnalyticDB for PostgreSQL data source can be associated with DataStudio only if the data source meets the following conditions: The AnalyticDB for PostgreSQL instance based on which the data source is added resides in the same region and belongs to the same Alibaba Cloud account as the workspace. This way, the AnalyticDB for PostgreSQL data source can be used for data development and periodic task scheduling.
You can add an AnalyticDB for PostgreSQL instance that does not belong to the current Alibaba Cloud account to a workspace within the current Alibaba Cloud account as a data source. After the data source is added, you can use only a RAM role to access the related AnalyticDB for PostgreSQL instance. AnalyticDB for PostgreSQL data sources that are added across accounts cannot be used for data development or periodic task scheduling.
AnalyticDB for PostgreSQL data sources that are added in connection string mode cannot be used for data development or periodic task scheduling. These data sources can be used only for data synchronization. For more information, see Add a data source.
You can use only a serverless resource group or an exclusive resource group for Data Integration or exclusive resource group for scheduling to run an AnalyticDB for PostgreSQL task. We recommend that you use a serverless resource group.
If you want to create an API in DataService Studio based on an AnalyticDB for PostgreSQL data source, we recommend that you use a serverless resource group.
Preparations: Permission description and configuration
If you want to add a data source as a RAM user or by using a RAM role, make sure that the RAM user or RAM role meets one of the following requirements:
The RAM user or RAM role is added to the desired workspace as a member and is assigned the Workspace Owner, Workspace Administrator, or O&M role. For more information, see Add a RAM user to a workspace as a member and assign roles to the member.
The RAM user or RAM role is attached the AliyunDataWorksFullAccess or AdministratorAccess policy. For more information, see Grant permissions to a RAM user or Grant permissions to a RAM role.
If you want to add a data source across accounts, you must also make the following permission configurations:
If you add an AnalyticDB for PostgreSQL data source across Alibaba Cloud accounts, you can use only a RAM role that is granted the required permissions to access the related AnalyticDB for PostgreSQL instance.
Sample scenario
In this sample scenario, an AnalyticDB for PostgreSQL instance that belongs to Alibaba Cloud Account B is added to a DataWorks workspace that belongs to Alibaba Cloud Account A as a data source, and Alibaba Cloud Account A needs to access the AnalyticDB for PostgreSQL instance.
Alibaba Cloud Account A: A DataWorks workspace is created within Alibaba Cloud Account A, and Alibaba Cloud Account A needs to access an AnalyticDB for PostgreSQL instance that is created within Alibaba Cloud Account B.
Alibaba Cloud Account B: An AnalyticDB for PostgreSQL instance is created within Alibaba Cloud Account B, and an AnalyticDB for PostgreSQL database is created in the instance. In addition, a RAM role is created within Alibaba Cloud Account B and is assigned to Alibaba Cloud Account A. This way, Alibaba Cloud Account A can access the AnalyticDB for PostgreSQL instance by assuming the RAM role.
After the preceding operations are performed, you can use Alibaba Cloud Account A to add the AnalyticDB for PostgreSQL instance to DataWorks as a data source and then access the AnalyticDB for PostgreSQL instance across Alibaba Cloud accounts.
Create and grant permissions to a RAM role
Requirements on a RAM role of Alibaba Cloud Account B and permission configuration for the RAM role:
You must create a RAM role within Alibaba Cloud Account B and grant the RAM role the permissions to access a specified AnalyticDB for PostgreSQL instance. You must add Alibaba Cloud Account A as the trusted cloud account of the RAM role to allow Alibaba Cloud Account A to assume the RAM role. For more information, see Create a RAM role for a trusted Alibaba Cloud account and Create custom policies.
You must modify the trust policy of the RAM role to allow Alibaba Cloud Account A to assume the RAM role to access the AnalyticDB for PostgreSQL instance. For more information, see Edit the trust policy of a RAM role. The following code shows the document of the trust policy:
{ "Version": "1", "Statement": [ { "Action": [ "sts:AssumeRole", "gpdb:DescribeDBInstanceAttribute", "gpdb:DescribeDBInstances", "gpdb:DescribeResourceUsage", "gpdb:DescribeDBInstanceIPArrayList", "gpdb:DescribeDBClusterIPArrayList", "gpdb:DescribeDBInstancePerformance", "gpdb:DescribeDBInstanceNetInfo", "gpdb:DescribeRegions", "gpdb:ModifySecurityIps" ], "Effect": "Allow", "Principal": { "Service": [ "ID of Alibaba Cloud Account A@engine.dataworks.aliyuncs.com" ] } } ] }
Entry points for adding a data source
Go to the Data Source page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, click Management Center in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.
In the left-side navigation pane of the SettingCenter page, choose
.
On the Data Sources page, click Add Data Source. In the Add Data Source dialog box, click AnalyticDB for PostgreSQL. On the page that appears, configure the parameters to add an AnalyticDB for PostgreSQL data source.
You can also go to the Data Sources page in Data Integration to add an AnalyticDB for PostgreSQL data source. You can add a data source only to the production environment on the Data Sources page in Data Integration. After the data source is added, you must manage the data source on the Data Sources page in Management Center. You can go to Data Integration to view the types of data sources that you can add in this service.
Add a data source
DataWorks allows you to add a data source in one of the following modes.
If you use a workspace in standard mode, you must separately add a data source in the development environment and production environment. For information about the modes of workspaces, see Differences between workspaces in basic mode and workspaces in standard mode.
Mode 1: Alibaba Cloud instance mode
Configure the parameters in the Basic Configuration section.
Parameter
Description
Data Source Name
The name of the data source in DataWorks. The name must be unique within the current tenant.
Configuration Mode
The mode in which you want to add the data source. Select Alibaba Cloud Instance Mode.
Alibaba Cloud Account
Specifies whether the AnalyticDB for PostgreSQL instance you want to use belongs to the current Alibaba Cloud account or another Alibaba Cloud account. Valid values:
Current Alibaba Cloud Account: The AnalyticDB for PostgreSQL instance belongs to the current Alibaba Cloud account.
Another Alibaba Cloud Account: The AnalyticDB for PostgreSQL instance belongs to another Alibaba Cloud account.
NoteIf you set this parameter to Another Alibaba Cloud Account, you must add the AnalyticDB for PostgreSQL data source across accounts. After the data source is added, you can use only a RAM role to access the related AnalyticDB for PostgreSQL instance. In addition, the data source cannot be associated with DataStudio for data development or periodic task scheduling. The data source can be used only in Data Integration for data synchronization.
Other parameters that you must configure vary based on the value of the Alibaba Cloud Account parameter.
Region
The region in which the AnalyticDB for PostgreSQL instance that you want to use resides.
NoteIf the region that you selected is different from the region in which the workspace resides, you cannot associate the data source with DataStudio after you add the data source. In this case, the data source cannot be used for data development or periodic task scheduling. The data source can be used only in Data Integration for data synchronization.
Other items such as the AnalyticDB for PostgreSQL instance and the access identity
Other parameters that you must configure vary based on the value of the Alibaba Cloud Account parameter.
Value of the Alibaba Cloud Account parameter: Current Alibaba Cloud Account
Instance: Select the AnalyticDB for PostgreSQL instance that you want to add as a data source from the Instance drop-down list.
Database Name, Username, Password: Enter the name of the desired AnalyticDB for PostgreSQL database in the Database Name field, and separately enter the username and password that are used to access the database in the Username and Password fields. You can log on to the AnalyticDB for PostgreSQL to obtain information about the AnalyticDB for PostgreSQL database.
Value of the Alibaba Cloud Account parameter: Another Alibaba Cloud Account
UID Of Alibaba Cloud Account, Instance ID, RAM Role: Enter the UID of the Alibaba Cloud account to which the AnalyticDB for PostgreSQL instance you want to add as a data source belongs in the UID Of Alibaba Cloud Account field, enter the ID of the instance in the Instance ID field, and then enter the RAM role that you want to use to access the instance in the RAM Role field. The AnalyticDB for PostgreSQL instance is added to DataWorks as a data source, and you must use the RAM role to access the instance.
NoteFor information about the permission configuration required by a RAM role to access a data source that is added across accounts, see the Preparations: Permission description and configuration section in this topic.
Database Name, Username, Password: Enter the name of the desired AnalyticDB for PostgreSQL database in the Database Name field, and separately enter the username and password that are used to access the database in the Username and Password fields. You can log on to the AnalyticDB for PostgreSQL to obtain information about the AnalyticDB for PostgreSQL database.
NoteIf you set this parameter to Another Alibaba Cloud Account, you must add the AnalyticDB for PostgreSQL data source across accounts. After the data source is added, you can use only a RAM role to access the related AnalyticDB for PostgreSQL instance. In addition, the data source cannot be associated with DataStudio for data development or periodic task scheduling. The data source can be used only in Data Integration for data synchronization.
Test the network connectivity between the data source and a resource group.
Resource groups provided by DataWorks can be classified into resource groups for Data Integration, resource groups for scheduling, and resource groups for DataService Studio based on the use scenarios of the resource groups. For more information about resource groups, see Overview.
You can find the resource group that you want to use in the Connection Configuration section based on the scenario in which you want to use the data source, and then test the network connectivity between the resource group and the data source. If the network connectivity test fails, tasks that use the data source cannot be run.
Mode 2: Connection string mode
You can add a data source that belongs only to the current Alibaba Cloud account in connection string mode.
Configure the parameters in the Basic Configuration section.
Parameter
Description
Data Source Name
The name of the data source in DataWorks. The name must be unique within the current tenant.
Configuration Mode
Set this parameter to Connection String Mode.
JDBC URL
The JDBC URL that is used to connect to the AnalyticDB for PostgreSQL database. You can log on to the AnalyticDB for PostgreSQL to obtain information about the AnalyticDB for PostgreSQL database and the port number over which you can access the database.
Username
The username that you can use to access the database.
Password
The password that you can use to access the database.
Test the network connectivity between the data source and a resource group.
Resource groups provided by DataWorks can be classified into resource groups for Data Integration, resource groups for scheduling, and resource groups for DataService Studio based on the use scenarios of the resource groups. For more information about resource groups, see Overview.
You can find the resource group that you want to use in the Connection Configuration section based on the scenario in which you want to use the data source, and then test the network connectivity between the resource group and the data source. If the network connectivity test fails, tasks that use the data source cannot be run.
What to do next
After the data source is added, you can perform the following operations based on your business requirements:
Develop and schedule computing tasks:
DataWorks DataStudio and Operation Center provide the capabilities of developing and scheduling AnalyticDB for PostgreSQL tasks. If you want to develop AnalyticDB for PostgreSQL tasks based on the AnalyticDB for PostgreSQL data source or periodically schedule AnalyticDB for PostgreSQL tasks, you must go to the DataStudio page in the DataWorks console and associate the AnalyticDB for PostgreSQL data source with DataStudio.
NoteYou can associate an AnalyticDB for PostgreSQL data source with DataStudio only if the AnalyticDB for PostgreSQL instance based on which the data source is added resides in the same region and belongs to the same Alibaba Cloud account as the workspace to which the data source is added.
DataWorks Data Integration provides AnalyticDB for PostgreSQL Reader and AnalyticDB for PostgreSQL Writer for you to read data from and write data to an AnalyticDB for PostgreSQL data source. You can configure a batch synchronization task for the AnalyticDB for PostgreSQL data source in DataStudio or configure a synchronization task for the AnalyticDB for PostgreSQL data source in Data Integration based on your business requirements to perform data synchronization.
Manage the data source: You can go to the Data Sources page in Management Center to perform management operations on the data source. For example, you can edit or delete the data source.