All Products
Search
Document Center

ApsaraDB for ClickHouse:Use an external table to import data from MaxCompute to ApsaraDB for ClickHouse (V20.3 or later)

Last Updated:Dec 17, 2024

This topic describes how to import data from an external table in MaxCompute to ApsaraDB for ClickHouse.

Usage notes

  • The kernel version of your ApsaraDB for ClickHouse Community-compatible Edition cluster must be 20.3 or later.

  • You are charged for using MaxCompute. For more information about the billing of MaxCompute, see Billable items and billing methods.

  • To ensure a smooth connection between your ApsaraDB for ClickHouse cluster and MaxCompute project, you must access MaxCompute over a virtual private cloud (VPC), and the MaxCompute project must be deployed in the same region as the ApsaraDB for ClickHouse cluster.

Prerequisites

  • MaxCompute is activated. For more information, see Activate MaxCompute and DataWorks.

  • A MaxCompute project is created in the same region as the ApsaraDB for ClickHouse cluster. For more information, see Create a MaxCompute project.

  • If you want to use MaxCompute as a RAM user, you must use the Alibaba Cloud account to grant permissions to the RAM user. For more information, see Prepare a RAM user.

Procedure

  1. Create an external table in MaxCompute.

    1. Log on to the MaxCompute console .

    2. In the left-side navigation pane, choose DataWorks > Data Development.

    3. On the Data Development page, move the pointer over the 新建 icon, and choose Create Table > MaxCompute > Table.

    4. In the Create Table dialog box, configure the Name parameter. In this example, maxcompute is used.

    5. Click Create.

    6. In the General section, configure the parameters.

      基本属性

      The following table describes the parameters.

      Parameter

      Description

      Display Name

      The display name of the table.

      Theme

      The folders that are used to store and manage the table. You can specify the level-1 and level-2 folders to store the table. The Level-1 Theme and Level-2 Theme parameters can be used to categorize tables based on business categories. You can store tables of the same business category in the same folder.

      Note

      The level-1 and level-2 themes in the Workspace Tables pane of the DataStudio page help you better manage tables in folders. You can quickly find the current table in the Workspace Tables pane by theme. If no theme is available, you can create one. For information about how to create a theme, see the Create or manage folders for tables section of the "Manage settings for tables" topic.

    7. Click DDL in the toolbar.

    8. In the DDL dialog box, enter the following statements and click Generate Table Schema:

      CREATE TABLE IF NOT EXISTS maxcompute
      (    
      v1  INT,    
      v2  INT                
      )
      PARTITIONED BY
      (
          v3 STRING                   
      );
    9. Click Commit to Development Environment and Commit to Production Environment in sequence on the configuration page of the MaxCompute table.

  2. Write data to the MaxCompute external table.

    1. On the DataStudio page, click Ad Hoc Query in the left-side navigation pane.

    2. Move the pointer over the 新建 icon and choose Create > ODPS SQL.

    3. In the Create Node dialog box, configure the Path and Name parameters.

    4. Click Confirm.

    5. On the editing page of the node, enter the following statement to write data into the MaxCompute external table:

      insert into maxcompute PARTITION (v3='2021') values (1,2),(2,3);
    6. Click the 执行 icon in the toolbar.

    7. In the Parameters dialog box, select Common scheduler resource group and click Run.

    8. In the Estimate MaxCompute Computing Cost dialog box, check and confirm the cost in the Expense Estimate parameter, and then click Run.

  3. Create an ApsaraDB for ClickHouse table.

    1. Log on to the ApsaraDB for ClickHouse console.

    2. On the Clusters page, find the cluster that you want to manage and click the cluster ID.

    3. In the upper-right corner of the cluster details page, click Log On to Database.

    4. Enter the following statement and click Execute(F8).

      Use the following syntax to create a table:

      CREATE TABLE <table_name> [on cluster default]
      (
      'col_name1' col_type1,
      'col_name2' col_type2,
      ...
      )
      ENGINE = MaxCompute('<tunnel-endpoint>', '<project-name>', '<table-name>', '<partition-spec>', '<access-key-id>', '<access-key-secret>', <read-thread-num>, '<quota>');
      Note

      If you use the MaxComputeRaw table engine, replace tunnel-endpoint in the preceding syntax with odps-endpoint, which indicates the VPC endpoint. For more information about how to view the VPC endpoint, see Endpoints.

      The following table describes the parameters.

      Parameter

      Description

      table_name

      The name of the table.

      col_name1,col_name2

      The names of the columns.

      col_type1,col_type2

      The data types of the columns.

      Note

      The schema type of the ApsaraDB for ClickHouse table must map the schema type of the MaxCompute table. For more information about the mappings, see the Data type mappings section in this topic.

      tunnel-endpoint

      The Tunnel endpoint.

      To ensure a smooth connection between your ApsaraDB for ClickHouse cluster and MaxCompute project, you must access MaxCompute over a VPC, and the MaxCompute project must be deployed in the same region as the ApsaraDB for ClickHouse cluster.

      Note

      For more information about how to view the Tunnel endpoint of a VPC, see Endpoints.

      project-name

      The name of the MaxCompute project.

      table-name

      The name of the MaxCompute external table.

      partition-spec

      The partition of the MaxCompute external table.

      access-key-id

      The AccessKey ID that is used to access the MaxCompute project.

      access-key-secret

      The AccessKey secret that is used to access the MaxCompute project.

      read-thread-num (Optional)

      The number of concurrent reads for each partition of the MaxCompute table. Default value: 1.

      quota (Optional)

      The unit of computing resources in MaxCompute. By default, the quota is a shared quota.

      Execute the following statement to create a table:

      CREATE TABLE default.maxcomputetest ON CLUSTER default (
      v1 Int32, 
      v2 Int32
      ) ENGINE = MaxCompute('http://dt.cn-hangzhou.maxcompute.aliyun-inc.com', 'ckfwt', 'maxcompute', 'v3=2021', 'LTAI5tDVcUKu2CbcBwhr****', 'WAcroHBcL7IVOK8FIHzNJy91Lc****');
  4. Query the ApsaraDB for ClickHouse table.

    1. Log on to the ApsaraDB for ClickHouse console.

    2. On the Clusters page, find the cluster that you want to manage and click the cluster ID.

    3. In the upper-right corner of the cluster details page, click Log On to Database.

    4. Enter a query statement and click Execute(F8).

      SELECT * FROM maxcomputetest;

      The following result is returned.查询

Data type mappings

Data type in MaxCompute

Data type in ApsaraDB for ClickHouse

Boolean

UInt8

Tinyint

UInt8, Int8

Smalllint

UInt16, Int16

Int

UInt32, Int32

Bigint

UInt64, Int64

Float

Float32

Double

Float64

Char

String

Varchar

String

Binary

String

String

String

Date

Date

Datetime

Datetime

UUID

Not supported

INTERVAL

Not supported

Decimal

String

Timestamp

Not supported

Map

Nested(Nullable(keyType), Nullable(valueType)). Only unnested Map structures are supported.

For example, map<string, string> in MaxCompute maps Nested(k Nullable(String), v Nullable(String)) in ApsaraDB for ClickHouse.

Note

The Nested structure can contain only nullable values. Otherwise, an error occurs when you create a table.

Array

Array(Nullable(Type)). Only unnested Array structures are supported.

For example, array<int> in MaxCompute maps Array(Nullable(Int32)) in ApsaraDB for ClickHouse.

Note

The Array structure can contain only nullable values. Otherwise, an error occurs when you create a table.

Struct

Tuple(Nullable(Type)). Only unnested Struct structures are supported.

For example, struct<x:int, y:string> in MaxCompute maps Tuple(Nulable(Int32), Nullable(String)) in ApsaraDB for ClickHouse.

Note

The Tuple structure can contain only nullable values. Otherwise, an error occurs when you create a table.