This topic describes how to use ApsaraDB for ClickHouse to query data in Object Storage Service (OSS) and how to write data to OSS.
Prerequisites
OSS is activated. For information about how to activate OSS, see Activate OSS.
An OSS bucket is created in the region where the ApsaraDB for ClickHouse cluster is deployed. For information about how to create an OSS bucket, see Create buckets.
A file that contains data is uploaded to OSS. For information about how to upload a file to OSS, see Upload objects.
NoteIn this example, the following content is stored in a file named test.csv:
1,tick,32,shanghai,http://example.com
.The Alibaba Cloud account that you use to access the OSS bucket is granted the required permissions to read and write objects in the bucket. For information about how to grant permissions, see Access control.
Precautions
ApsaraDB for ClickHouse allows you to access OSS files in multiple formats except Protobuf and CapnProto. For more information, see Formats for Input and Output Data.
Make sure that your OSS bucket and ApsaraDB for ClickHouse cluster are deployed in the same region, and you can access OSS over a virtual private cloud (VPC).
If you use ApsaraDB for ClickHouse to write data to OSS, original data in OSS files is overwritten.
Step 1: Log on to the database
Log on to the ApsaraDB for ClickHouse console.
On the Clusters page, click the Clusters of Community-compatible Edition tab, and click the ID of the cluster that you want to manage.
On the Cluster Information page, click Log On to Database in the upper-right corner.
In the Log on to Database Instance dialog box, enter the username and password of the database account, and click Login.
Step 2: Create an OSS external table
Create an OSS external table.
For a cluster of version 21.8 and earlier, use the following syntax to create an OSS external table:
CREATE TABLE <table_name> [on cluster default] ( 'col_name1' col_type1, 'col_name2' col_type2, ... ) ENGINE = OSS('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>');
For a cluster of version 22.8, use the following syntax to create an OSS external table:
CREATE TABLE <table_name> [on cluster default] ( 'col_name1' col_type1, 'col_name2' col_type2, ... ) ENGINE = OSS('https://<bucketName>.<oss-endpoint>/<file-name>', '<access-key-id>', '<access-key-secret>', '<file-format-name>');
The following table describes the parameters.
Parameter
Description
table_name
The name of the table.
col_name1,col_name2
The names of the columns.
col_type1,col_type2
The data types of the columns.
ImportantThe schema of the OSS external table must correspond to the schema of the CSV file that contains the data in OSS.
BucketName
The name of the bucket.
oss-endpoint
The private endpoint that you want to use to access an Elastic Compute Service (ECS) instance over the VPC in which the OSS bucket is deployed. For more information, see Regions and endpoints.
ImportantMake sure that the OSS bucket is deployed in the same region as the ApsaraDB for ClickHouse cluster.
file-name
The name of the file.
access-key-id
The AccessKey ID that you want to use to access the data in OSS.
access-key-secret
The AccessKey secret that you want to use to access the data in OSS.
oss-file-path
The storage path of the CSV file that you uploaded. The path is in the
oss://<bucket-name>/<path-to-file>
format.NoteYou can set the
oss-file-path
parameter to a value that includes wildcard characters to perform a fuzzy match. For more information, see Use wildcard characters to perform fuzzy match for storage paths in OSS.file-format-name
The format of the file. In this topic, the CSV format is used.
Sample statements:
For a cluster of version 21.8 and earlier, the following statements provide an example on how to create an OSS external table:
CREATE TABLE oss_test_tbl on cluster default ( id UInt8, user_name String, age UInt16, city String, access_url String ) ENGINE = OSS('oss-cn-shanghai-internal.aliyuncs.com', 'LTAI5tDVcUKu2CbcBwhr****', 'WAcroHBcL7IVOK8FIHzNJy91Lc****', 'oss://testBucketName/test.csv', 'CSV');
For a cluster of version 22.8, the following statements provide an example on how to create an OSS external table:
CREATE TABLE oss_test_tbl on cluster default ( id UInt8, user_name String, age UInt16, city String, access_url String ) ENGINE = OSS('http://testBucketName.oss-cn-shanghai-internal.aliyuncs.com/test.csv', 'LTAI5tDVcUKu2CbcBwhr****', 'WAcroHBcL7IVOK8FIHzNJy91Lc****','CSV')
Query the data in the OSS external table.
select * from oss_test_tbl;
The following result is returned:
┌─id─┬─user_name─┬──age──┬───city─────┬─────access_url────────┐ │ 1 │ tick │ 32 │ shanghai │ http://example.com │ └────┴───────────┴───────┴────────────┴───────────────────────┘
Step 3: Write data to OSS
insert into oss_test_tbl values(11, 'tick', 25, 'shanghai', 'http://example.com');
Step 4: Query the data in OSS
ApsaraDB for ClickHouse allows you to use a table engine or table function to query the data in OSS.
Use a table engine to query the data in OSS.
Query the data in the OSS external table.
select * from oss_test_tbl;
The following result is returned:
┌─id─┬─user_name─┬──age──┬───city─────┬─────access_url────────┐ │ 11 │ tick │ 25 │ shanghai │ http://example.com │ └────┴───────────┴───────┴────────────┴───────────────────────┘
Use a table function to query the data in OSS.
For a cluster of version 21.8 and earlier, use the following syntax to query the data in OSS:
SELECT * FROM oss('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>', '<col_name> <col_type>(,...)');
For a cluster of version 22.8, use the following syntax to query the data in OSS:
SELECT * FROM oss('https://<bucketName>.<oss-endpoint>/<file-name>','<access-key-id>', '<access-key-secret>', '<file-format-name>', '<col_name> <col_type>(,...)');
For more information about parameters, see the table that describes parameters.
Sample statements:
For a cluster of version 21.8 and earlier, the following statements provide an example on how to query the data in OSS:
SELECT * FROM oss('oss-cn-shanghai-internal.aliyuncs.com', 'LTAI5tDVcUKu2CbcBwhr****', 'WAcroHBcL7IVOK8FIHzNJy91Lc****', 'oss://testBucketName/test.csv', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String');
For a cluster of version 22.8, the following statements provide an example on how to query the data in OSS:
SELECT * FROM oss('http://testBucketName.oss-cn-shanghai-internal.aliyuncs.com/test.csv', 'LTAI5tDVcUKu2CbcBwhr****', 'WAcroHBcL7IVOK8FIHzNJy91Lc****', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String')
The following result is returned:
┌─id─┬─user_name─┬──age──┬───city─────┬─────access_url────────┐ │ 11 │ tick │ 25 │ shanghai │ http://example.com │ └────┴───────────┴───────┴────────────┴───────────────────────┘