All Products
Search
Document Center

DataWorks:Add an SSH data source

Last Updated:Jul 23, 2024

In DataWorks, you can add an SSH data source and configure host access information to implement remote access to a host. Then, you can use an SSH node to remotely access the host based on the data source and trigger script running on the host. For example, you can use this method to remotely access an Elastic Compute Service (ECS) instance from DataWorks and trigger periodic scheduling of scripts on the ECS instance. This topic describes how to add an SSH data source.

Limits

  • You can add an SSH data source only in connection string mode.

  • You can use only an exclusive resource group for scheduling to run SSH tasks. You must submit a ticket to contact technical support to upgrade the configurations of the exclusive resource group for scheduling that you want to use. Otherwise, tasks that are run on the resource group may fail.

Precautions

If you use a workspace in standard mode, you must separately add data sources in the development environment and production environment. The data source in the development environment and the data source in the production environment must use the same authentication mode.

Prerequisites

  • The host address and port number of a desired server are obtained.

  • An exclusive resource group for scheduling is purchased and configured.

    After you add an SSH data source, you can use only an exclusive resource group for scheduling to develop and schedule computing tasks that use the SSH data source. Therefore, you must prepare a resource group that meets your business requirements in advance, and make sure that the SSH data source is connected to the resource group. For more information, see Create and use an exclusive resource group for scheduling and Network connectivity solutions.

Preparations: Permission description and configuration

If you want to add a data source as a RAM user or by using a RAM role, make sure that the RAM user or RAM role meets one of the following requirements:

Go to the Data Source page

  1. Go to the Data Source page.

    1. Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, click Management Center in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.

    2. In the left-side navigation pane of the SettingCenter page, choose Data Sources > Data Sources.

  2. On the Data Source page, click Add Data Source. In the Add Data Source dialog box, click SSH. On the Add SSH Data Source page, configure the parameters as prompted.

Add a data source

On the Add SSH Data Source page, you must configure the parameters in the Basic Information section and test the network connectivity in the Connection Configuration section.

  1. Configure the parameters in the Basic Information section.

    You can configure basic information such as the name of the data source as prompted.

    Note

    If you use a workspace in standard mode, you must separately add data sources in the development environment and production environment. The data source in the development environment and the data source in the production environment must use the same authentication mode.

    image

    Core parameters:

    • Configure mode: You can add an SSH data source only in connection string mode.

    • Authentication Mode:

      Host Password Authentication

      Parameter

      Description

      Host Address

      The host address of the SSH server.

      Host Port

      The host port of the SSH server.

      User Name

      The username that is used to log on to the SSH server.

      Password

      The password that is used to log on to the SSH server.

      Host SSH key authentication

      Parameter

      Description

      Host Address

      The host address of the SSH server.

      Host Port

      The host port of the SSH server.

      User Name

      The username that is used to log on to the SSH server.

      Private Key

      The private key that is used to log on to the SSH server. You must upload relevant authentication files for identity authentication on users and services. For information about how to manage authentication files, see Manage third-party authentication files.

      Private Key Password

      If a private key file that is uploaded is encrypted, you must enter the private key password.

      (Recommended) DataWorks SSH public key authentication

      DataWorks can generate a key pair based on an SSH data source and provide the public key in the key pair to users for connection between DataWorks and the SSH server. This authentication mode is relatively secure.

      Parameter

      Description

      Host Address

      The host address of the SSH server.

      Host Port

      The host port of the SSH server.

      User Name

      The username that is used to log on to the SSH server.

      Public Key

      After you click Generate Key Pair, DataWorks generates a random public key based on the username that you specified. Before you test the network connectivity, configure the public key in the .ssh/authorized_keys file to prevent a network connection failure.

      Note
      • Specific trusted certificates are stored in truststore files to authenticate servers. For example, when you access an SSL server, you must authenticate the server to ensure that the server is trusted.

      • The generated key pair takes effect after the data source is successfully added. You must configure the public key in the generated key pair to your host at the earliest opportunity.

      • When you modify the data source, a new key pair is generated each time you click Generate Key Pair. After you save the modifications, the original key pair becomes invalid. This operation may cause running tasks to fail. Exercise caution when you perform this operation.

  2. Test the network connectivity between the data source and a resource group.

    In the Connection Configuration section, test the network connectivity between the data source and an exclusive resource group for scheduling. If the network connectivity test fails, tasks that use the data source cannot be run. Make sure that the exclusive resource group for scheduling can access your host as expected. For information about how to establish a network connection, see Network connectivity solutions.

    Note
    • You can use only an exclusive resource group for scheduling to run SSH tasks. You must submit a ticket to contact technical support to upgrade the configurations of the exclusive resource group for scheduling that you want to use. Otherwise, tasks that are run on the resource group may fail.

    • If a network connection fails to be established, we recommend that you first try to configure the IP address of the resource group to the inbound rule of the security group of the desired server instance, and then use the public or private IP address of the resource group to connect to the server.

What to do next

After the data source is added, you can perform the following operations based on your business requirements:

  • Develop and schedule computing tasks:

    DataWorks DataStudio and Operation Center provide the capabilities of developing and scheduling SSH tasks. On the DataStudio page, you can specify an SSH data source in an SSH node to remotely access the host that is connected to the data source, and deploy the SSH node to the production environment to implement periodic scheduling of code of the SSH node.

  • Manage the data source: You can go to the Data Source page in SettingCenter to perform management operations on the data source. For example, you can modify or remove the data source.