Integrate data

Updated at: 2024-12-09 02:02

Data Integration is a tool that allows you to import external data into ApsaraDB for SelectDB instances and databases in a visualized manner. This topic describes how to use the Data Integration tool of ApsaraDB for SelectDB.

Prerequisites

  • An ApsaraDB for SelectDB instance is created. For more information, see Create an instance.

  • The version of the instance is 3.0.7 or later.

Procedure

  1. Log on to the ApsaraDB for SelectDB console.

  2. In the top navigation bar, select the region in which the instance that you want to manage resides.

  3. In the left-side navigation pane, click Instances. On the Instances page, find the instance and click its ID to go to the Instance Details page.

  4. Click Data Development and Management in the upper-right corner.

    Note

    If you use the tools of Data Development and Management for the first time, a message appears and prompts you to add the public IP address of your machine to the IP address whitelist named webui_whitelist of the instance. Read the message and click OK.

  5. Select Data Integration from the drop-down list.

    Note

    If you use Data Integration for the first time and have not logged on to the WebUI system, the WebUI logon page appears.

    • You can use the admin account to log on to the WebUI system.

    • If you do not know or forget the password of the admin account, you can reset the password. For more information, see Reset the password of an account.

  6. On the Integration page, perform the following operations based on your business requirements:

    If you have not created a data integration task, the Stage page appears after you perform the preceding steps. On the Stage page, you can only create data integration tasks.

    • Create a data integration task.

      Sample data
      OSS

      Sample data is used to perform benchmark tests on the performance of analytical databases. You can perform the following steps to import sample data:

      1. Select a sample data type.

        Click Create in the upper-left corner. On the New Integration page, select a sample data type in the Sample Data section.

        Sample data

        Description

        ClickBench

        The ClickBench datasets.

        TPC-H

        The TPC-H datasets.

        Github Demo

        The GitHub events.

        SSB-FLAT

        The SSB-FLAT datasets.

      2. On the New Integration page, configure the parameters that are described in the following table and click Create and Load.

        Parameter

        Description

        Example

        Integration Name

        The name of the data integration task.

        test

        Comment

        The description of the data integration task.

        test comment

        Cluster

        The cluster in which you want to run the data integration task.

        new_cluster

        Sample Data Size

        The size of the sample data.

        1GB

      1. Select an integration type.

        Click Create in the upper-left corner. On the New Integration page, click Object Storage in the Stage section.

      2. Configure the parameters.

        On the New Integration - Object Storage OSS page, configure the parameters that are described in the following table and click Confirm.

        Parameter

        Description

        Example

        Integration Name

        The name of the data integration task.

        test

        Comment

        The description of the data integration task.

        test comment

        Bucket

        The name of the Object Storage Service (OSS) bucket.

        test_bucket_name

        Default Data File Path

        The default path of the file in the OSS bucket.

        N/A

        Authentication

        The authorization method used to access OSS.

        Access Key

        Access Key

        The AccessKey ID of your Alibaba Cloud account.

        akdemo

        Secret Key

        The AccessKey secret of your Alibaba Cloud account.

        skdemo

        Advanced Settings

        The default properties to be used when you integrate and import objects.

        N/A

        File Configuration

        The properties of the objects to be integrated.

        N/A

        File Type

        The file type of OSS objects.

        Valid values: JSON, ORC, CSV, Parquet, and Automatic Recognition.

        JSON

        Compression Method

        The compression method of OSS objects.

        gz

        Column Separator

        The column delimiter of data in OSS objects.

        \t

        Line Delimiter

        The row delimiter of data in OSS objects.

        \n

        File Size

        The limits on the size of OSS objects.

        Unlimited

        Loading Configuration

        The default operations to be performed to import objects.

        N/A

        on Error

        Continue: continues to import objects if an error occurs.

        Abort: stops importing objects if an error occurs.

        Customized: uses the custom policy to import objects if an error occurs.

        Abort

        Strict Mode

        Open: filters out error data after column type conversion. The following rules apply:

        • Error data refers to NULL values that are generated in destination columns from NOT NULL values of source columns after column type conversion. The strict mode does not apply to destination columns whose NULL values are generated by functions.

        • If a destination column restricts values to a specific range and a value of the source column supports type conversion but the converted value does not belong to the range, the strict mode does not apply to the destination column. For example, a value of the source column is 10 and the destination column is of the DECIMAL(1,0) type. The value 10 can be converted but the converted value does not belong to the range specified for the destination column. In this case, the strict mode does not apply to the destination column.

        Close: does not filter out error data after column type conversion.

        Open

    • Query a data integration task: In the upper-right corner of the Integration page, click the Search icon and enter the name of a data integration task in the search box.

    • Delete an integration task: In the integration task list on the Integrations page, find the integration task that you want to delete and click the Delete icon in the Actions column.

      Note
      • If you delete a data integration task, the imported data is not affected but the data being imported may be affected.

      • After a data integration task is deleted, it cannot be recovered.

Related API operations

ResetAccountPassword

DescribeSecurityIPList

ModifySecurityIPList

References

Create an instance

Reset the password of an account

Configure an IP address whitelist

  • On this page (1, O)
  • Prerequisites
  • Procedure
  • Related API operations
  • References
Feedback
phone Contact Us