All Products
Search
Document Center

ApsaraDB for ClickHouse:Access data in OSS

Last Updated:Dec 17, 2024

This topic describes how to use ApsaraDB for ClickHouse to query data in Object Storage Service (OSS) and how to write data to OSS.

Prerequisites

  • OSS is activated. For information about how to activate OSS, see Activate OSS.

  • An OSS bucket is created in the region where the ApsaraDB for ClickHouse cluster is deployed. For information about how to create an OSS bucket, see Create buckets.

  • A file that contains data is uploaded to OSS. For information about how to upload a file to OSS, see Upload objects.

    Note

    In this example, the following content is stored in a file named test.csv: 1,tick,32,shanghai,http://example.com.

  • The Alibaba Cloud account that you use to access the OSS bucket is granted the required permissions to read and write objects in the bucket. For information about how to grant permissions, see Access control.

Precautions

  • ApsaraDB for ClickHouse allows you to access OSS files in multiple formats except Protobuf and CapnProto. For more information, see Formats for Input and Output Data.

  • Make sure that your OSS bucket and ApsaraDB for ClickHouse cluster are deployed in the same region, and you can access OSS over a virtual private cloud (VPC).

  • If you use ApsaraDB for ClickHouse to write data to OSS, original data in OSS files is overwritten.

Step 1: Log on to the database

  1. Log on to the ApsaraDB for ClickHouse console.

  2. On the Clusters page, click the Clusters of Community-compatible Edition tab, and click the ID of the cluster that you want to manage.

  3. On the Cluster Information page, click Log On to Database in the upper-right corner.

  4. In the Log on to Database Instance dialog box, enter the username and password of the database account, and click Login.

Step 2: Create an OSS external table

  1. Create an OSS external table.

    • For a cluster of version 21.8 and earlier, use the following syntax to create an OSS external table:

      CREATE TABLE <table_name> [on cluster default]
      (
      'col_name1' col_type1,
      'col_name2' col_type2,
      ...
      )
      ENGINE = OSS('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>');
    • For a cluster of version 22.8, use the following syntax to create an OSS external table:

      CREATE TABLE <table_name> [on cluster default]
      (
      'col_name1' col_type1,
      'col_name2' col_type2,
      ...
      )
      ENGINE = OSS('https://<bucketName>.<oss-endpoint>/<file-name>', '<access-key-id>', '<access-key-secret>', '<file-format-name>');

      The following table describes the parameters.

      Parameter

      Description

      table_name

      The name of the table.

      col_name1,col_name2

      The names of the columns.

      col_type1,col_type2

      The data types of the columns.

      Important

      The schema of the OSS external table must correspond to the schema of the CSV file that contains the data in OSS.

      BucketName

      The name of the bucket.

      oss-endpoint

      The private endpoint that you want to use to access an Elastic Compute Service (ECS) instance over the VPC in which the OSS bucket is deployed. For more information, see Regions and endpoints.

      Important

      Make sure that the OSS bucket is deployed in the same region as the ApsaraDB for ClickHouse cluster.

      file-name

      The name of the file.

      access-key-id

      The AccessKey ID that you want to use to access the data in OSS.

      access-key-secret

      The AccessKey secret that you want to use to access the data in OSS.

      oss-file-path

      The storage path of the CSV file that you uploaded. The path is in the oss://<bucket-name>/<path-to-file> format.

      Note

      You can set the oss-file-path parameter to a value that includes wildcard characters to perform a fuzzy match. For more information, see Use wildcard characters to perform fuzzy match for storage paths in OSS.

      file-format-name

      The format of the file. In this topic, the CSV format is used.

    Sample statements:

    • For a cluster of version 21.8 and earlier, the following statements provide an example on how to create an OSS external table:

      CREATE TABLE oss_test_tbl on cluster default
      (
      id UInt8,
      user_name String,
      age UInt16,
      city String,
      access_url String
      )
      ENGINE = OSS('oss-cn-shanghai-internal.aliyuncs.com', 'LTAI5tDVcUKu2CbcBwhr****', 'WAcroHBcL7IVOK8FIHzNJy91Lc****', 'oss://testBucketName/test.csv', 'CSV');
    • For a cluster of version 22.8, the following statements provide an example on how to create an OSS external table:

      CREATE TABLE oss_test_tbl on cluster default
      (
      id UInt8,
      user_name String,
      age UInt16,
      city String,
      access_url String
      )
      ENGINE = OSS('http://testBucketName.oss-cn-shanghai-internal.aliyuncs.com/test.csv', 'LTAI5tDVcUKu2CbcBwhr****', 'WAcroHBcL7IVOK8FIHzNJy91Lc****','CSV')
  2. Query the data in the OSS external table.

    select * from oss_test_tbl;

    The following result is returned:

    ┌─id─┬─user_name─┬──age──┬───city─────┬─────access_url────────┐
    │  1 │  tick     │   32  │  shanghai  │   http://example.com  │
    └────┴───────────┴───────┴────────────┴───────────────────────┘

Step 3: Write data to OSS

insert into oss_test_tbl values(11, 'tick', 25, 'shanghai', 'http://example.com');

Step 4: Query the data in OSS

ApsaraDB for ClickHouse allows you to use a table engine or table function to query the data in OSS.

  • Use a table engine to query the data in OSS.

    Query the data in the OSS external table.

    select * from oss_test_tbl;

    The following result is returned:

    ┌─id─┬─user_name─┬──age──┬───city─────┬─────access_url────────┐
    │ 11 │  tick     │   25  │  shanghai  │   http://example.com  │
    └────┴───────────┴───────┴────────────┴───────────────────────┘
  • Use a table function to query the data in OSS.

    • For a cluster of version 21.8 and earlier, use the following syntax to query the data in OSS:

      SELECT * FROM oss('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>', '<col_name> <col_type>(,...)');
    • For a cluster of version 22.8, use the following syntax to query the data in OSS:

      SELECT * FROM oss('https://<bucketName>.<oss-endpoint>/<file-name>','<access-key-id>', '<access-key-secret>', '<file-format-name>', '<col_name> <col_type>(,...)');

      For more information about parameters, see the table that describes parameters.

    Sample statements:

    • For a cluster of version 21.8 and earlier, the following statements provide an example on how to query the data in OSS:

      SELECT * FROM oss('oss-cn-shanghai-internal.aliyuncs.com', 'LTAI5tDVcUKu2CbcBwhr****', 'WAcroHBcL7IVOK8FIHzNJy91Lc****', 'oss://testBucketName/test.csv', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String');
    • For a cluster of version 22.8, the following statements provide an example on how to query the data in OSS:

      SELECT * FROM oss('http://testBucketName.oss-cn-shanghai-internal.aliyuncs.com/test.csv', 'LTAI5tDVcUKu2CbcBwhr****', 'WAcroHBcL7IVOK8FIHzNJy91Lc****', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String')

      The following result is returned:

      ┌─id─┬─user_name─┬──age──┬───city─────┬─────access_url────────┐
      │ 11 │  tick     │   25  │  shanghai  │   http://example.com  │
      └────┴───────────┴───────┴────────────┴───────────────────────┘