In DataWorks, you can add an SSH data source and configure host access information to implement remote access to a host. Then, you can use an SSH node to remotely access the host based on the data source and trigger script running on the host. For example, you can use this method to remotely access an Elastic Compute Service (ECS) instance from DataWorks and trigger periodic scheduling of scripts on the ECS instance. This topic describes how to add an SSH data source.
Limits
You can add an SSH data source only in connection string mode.
You can use only an exclusive resource group for scheduling to run SSH tasks. You must submit a ticket to contact technical support to upgrade the configurations of the exclusive resource group for scheduling that you want to use. Otherwise, tasks that are run on the resource group may fail.
Precautions
If you use a workspace in standard mode, you must separately add data sources in the development environment and production environment. The data source in the development environment and the data source in the production environment must use the same authentication mode.
For information about workspace modes, see Differences between workspaces in basic mode and workspaces in standard mode.
For information about authentication modes of data sources, see the description of the Authentication Mode parameter in this topic.
Prerequisites
The host address and port number of a desired server are obtained.
An exclusive resource group for scheduling is purchased and configured.
After you add an SSH data source, you can use only an exclusive resource group for scheduling to develop and schedule computing tasks that use the SSH data source. Therefore, you must prepare a resource group that meets your business requirements in advance, and make sure that the SSH data source is connected to the resource group. For more information, see Create and use an exclusive resource group for scheduling and Network connectivity solutions.
Preparations: Permission description and configuration
If you want to add a data source as a RAM user or by using a RAM role, make sure that the RAM user or RAM role meets one of the following requirements:
The RAM user or RAM role is added to the desired workspace as a member and is assigned the Workspace Owner, Workspace Administrator, or O&M role. For more information, see Add a RAM user to a workspace as a member and assign roles to the member.
The RAM user or RAM role is attached the AliyunDataWorksFullAccess or AdministratorAccess policy. For more information, see Grant permissions to a RAM user or Grant permissions to a RAM role.
Go to the Data Source page
Go to the Data Sources page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.
In the left-side navigation pane of the SettingCenter page, choose
.
On the Data Source page, click Add Data Source. In the Add Data Source dialog box, click SSH. On the Add SSH Data Source page, configure the parameters as prompted.
Add a data source
On the Add SSH Data Source page, you must configure the parameters in the Basic Information section and test the network connectivity in the Connection Configuration section.
Configure the parameters in the Basic Information section.
You can configure basic information such as the name of the data source as prompted.
NoteIf you use a workspace in standard mode, you must separately add data sources in the development environment and production environment. The data source in the development environment and the data source in the production environment must use the same authentication mode.
For information about workspace modes, see Differences between workspaces in basic mode and workspaces in standard mode.
For information about authentication modes of data sources, see the description of the Authentication Mode parameter in this topic.
Core parameters:
Configure mode: You can add an SSH data source only in connection string mode.
Authentication Mode:
Host Password Authentication
Parameter
Description
Host Address
The host address of the SSH server.
Host Port
The host port of the SSH server.
User Name
The username that is used to log on to the SSH server.
Password
The password that is used to log on to the SSH server.
Host SSH key authentication
Parameter
Description
Host Address
The host address of the SSH server.
Host Port
The host port of the SSH server.
User Name
The username that is used to log on to the SSH server.
Private Key
The private key that is used to log on to the SSH server. You must upload relevant authentication files for identity authentication on users and services. For information about how to manage authentication files, see Manage third-party authentication files.
Private Key Password
If a private key file that is uploaded is encrypted, you must enter the private key password.
(Recommended) DataWorks SSH public key authentication
DataWorks can generate a key pair based on an SSH data source and provide the public key in the key pair to users for connection between DataWorks and the SSH server. This authentication mode is relatively secure.
Parameter
Description
Host Address
The host address of the SSH server.
Host Port
The host port of the SSH server.
User Name
The username that is used to log on to the SSH server.
Public Key
After you click Generate Key Pair, DataWorks generates a random public key based on the username that you specified. Before you test the network connectivity, configure the public key in the
.ssh/authorized_keys
file to prevent a network connection failure.NoteSpecific trusted certificates are stored in truststore files to authenticate servers. For example, when you access an SSL server, you must authenticate the server to ensure that the server is trusted.
The generated key pair takes effect after the data source is successfully added. You must configure the public key in the generated key pair to your host at the earliest opportunity.
When you modify the data source, a new key pair is generated each time you click Generate Key Pair. After you save the modifications, the original key pair becomes invalid. This operation may cause running tasks to fail. Exercise caution when you perform this operation.
Test the network connectivity between the data source and a resource group.
In the Connection Configuration section, test the network connectivity between the data source and an exclusive resource group for scheduling. If the network connectivity test fails, tasks that use the data source cannot be run. Make sure that the exclusive resource group for scheduling can access your host as expected. For information about how to establish a network connection, see Network connectivity solutions.
NoteYou can use only an exclusive resource group for scheduling to run SSH tasks. You must submit a ticket to contact technical support to upgrade the configurations of the exclusive resource group for scheduling that you want to use. Otherwise, tasks that are run on the resource group may fail.
If a network connection fails to be established, we recommend that you first try to configure the IP address of the resource group to the inbound rule of the security group of the desired server instance, and then use the public or private IP address of the resource group to connect to the server.
What to do next
After the data source is added, you can perform the following operations based on your business requirements:
Develop and schedule computing tasks:
DataWorks DataStudio and Operation Center provide the capabilities of developing and scheduling SSH tasks. On the DataStudio page, you can specify an SSH data source in an SSH node to remotely access the host that is connected to the data source, and deploy the SSH node to the production environment to implement periodic scheduling of code of the SSH node.
Manage the data source: You can go to the Data Source page in SettingCenter to perform management operations on the data source. For example, you can modify or remove the data source.