This topic describes how to query data in Alibaba Cloud Object Storage Service (OSS) using ApsaraDB for ClickHouse. This topic also describes how to write data to OSS or export data from ClickHouse to OSS.
Prerequisites
OSS requirements:
The OSS service is activated. For more information, see Activate OSS.
The Resource Access Management (RAM) user that you use to access OSS has read and write permissions on OSS objects. For more information, see Overview of permissions and access control.
Cluster requirements:
You have created a database account. For more information, see Account Management.
Preparations
Create an OSS bucket. For more information, see Create buckets in the console.
ImportantMake sure that the OSS bucket is in the same region as your ApsaraDB for ClickHouse cluster.
This topic uses a bucket named ck-test-oss as an example.
Upload a file.
This topic uses the test.csv file as an example. Before you start, upload this file to OSS. Files that are uploaded to OSS are called objects. For more information, see Upload objects using the console.
If you already have data in OSS, ensure that the data is in a file format that ApsaraDB for ClickHouse can access. All formats are supported except for Protobuf and CapnProto. For more information, see File formats supported by ClickHouse.
Query OSS data
Method 1: Use an OSS external table
Enterprise Edition or Community-compatible Edition v22.8 or later
Log on to the database.
Log on to the ApsaraDB for ClickHouse console.
On the Clusters page, select the Clusters of Community-compatible Edition tab or the Enterprise Edition Clusters tab, and then click the ID of your cluster.
On the Cluster Information page, click Log On to Database in the upper-right corner.
In the Log On to Instance dialog box, enter the database account and password, and click Log On.
Create an OSS external table.
An OSS external table is a special table type in ApsaraDB for ClickHouse that lets you directly query files stored in OSS without importing data into local storage. This feature separates storage from computing.
The following is the syntax for creating a table.
CREATE TABLE <table_name> [ON cluster default] ( 'col_name1' col_type1, 'col_name2' col_type2, ... ) ENGINE = OSS('https://<bucket-name>.<oss-endpoint>/<file-name>', '<access-key-id>', '<access-key-secret>', '<file-format-name>');Parameters:
Parameter name
Description
Example
table_name
The table name.
oss_test_tb
col_name1,col_name2
The column name.
user_name
col_type1,col_type2
The column type.
ImportantThe structure of the OSS external table must match the OSS data.
String
bucket-name
The name of the OSS bucket.
ck-test-oss
oss-endpoint
This endpoint provides private network access to an ECS instance in a VPC. For more information, see Regions and Endpoints.
ImportantMake sure that the OSS bucket and your ApsaraDB for ClickHouse cluster are in the same region.
oss-cn-hangzhou-internal.aliyuncs.com
file-name
The name of the object in the OSS bucket.
test.csv
access-key-id
The AccessKey ID of the RAM user that is used to access OSS data.
yourAccessKeyID
access-key-secret
The AccessKey secret of the RAM user that is used to access OSS data.
Use an existing AccessKey or create a new one. For more information, see Create an AccessKey.
ImportantTo reduce security threats, the AccessKey secret is shown only when you create it. You cannot retrieve it later. Make sure to store it securely.
yourAccessKeySecret
file-format-name
File format
ImportantThis parameter must be specified exactly as listed in File formats supported by ClickHouse. The names are case-sensitive.
CSV
Example:
CREATE TABLE oss_test_tb ON cluster default ( id UInt8, user_name String, age UInt16, city String, access_url String ) ENGINE = OSS('http://ck-test-oss.oss-cn-hangzhou-internal.aliyuncs.com/test.csv', 'LTAI****************', 'yourAccessKeySecret','CSV')Query the OSS data.
SELECT * FROM oss_test_tb;The result is as follows.
+--------------+---------------------+---------------+----------------+----------------------+ | id | user_name | age | city | access_url | +--------------+---------------------+---------------+----------------+----------------------+ | 1 | tick | 32 | shanghai | http://example.com | +--------------+---------------------+---------------+----------------+----------------------+ Returned rows: [1], Time elapsed: [183ms]
Community-compatible Edition v21.8 or earlier
Log on to the database.
Log on to the ApsaraDB for ClickHouse console.
On the Clusters page, click the Clusters of Community-compatible Edition tab, and then click the ID of your cluster.
On the Cluster Information page, click Log On to Database in the upper-right corner.
In the Log On to Instance dialog box, enter the database account and password, and click Log On.
Create an OSS external table.
The following is the syntax for creating a table.
CREATE TABLE <table_name> [ON cluster default] ( 'col_name1' col_type1, 'col_name2' col_type2, ... ) ENGINE = OSS('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>');Parameters:
Parameter Name
Description
Example
table_name
The table name.
oss_test_tb
col_name1,col_name2
The column name.
user_name
col_type1,col_type2
The column type.
ImportantThe structure of the OSS external table must match the OSS data.
String
oss-endpoint
The Endpoint used to access an ECS instance over the VPC private network. For more information, see Regions and Endpoints.
ImportantMake sure that the OSS bucket and your ApsaraDB for ClickHouse cluster are in the same region.
oss-cn-hangzhou-internal.aliyuncs.com
access-key-id
The AccessKey ID of the RAM user that is used to access OSS data.
yourAccessKeyID
access-key-secret
The AccessKey secret of the RAM user that is used to access OSS data.
Use an existing AccessKey or create a new one. For more information, see Create an AccessKey.
ImportantTo reduce security threats, the AccessKey secret is shown only when you create it. You cannot retrieve it later. Make sure to store it securely.
yourAccessKeySecret
oss-file-path
The storage path of the object. The format is usually
oss://<bucket-name>/<path-to-file>.NoteThe
oss-file-pathparameter supports fuzzy matching with wildcard characters. For more information, see Use wildcard characters for fuzzy matching of OSS storage paths.oss://ck-test-oss/test.csv
file-format-name
File format
ImportantThis parameter must be specified exactly as listed in File formats supported by ClickHouse. The names are case-sensitive.
CSV
Example:
CREATE TABLE oss_test_tb ON cluster default ( id UInt8, user_name String, age UInt16, city String, access_url String ) ENGINE = OSS('oss-cn-hangzhou-internal.aliyuncs.com', 'LTAI****************', 'yourAccessKeySecret', 'oss://ck-test-oss/test.csv', 'CSV');Query the OSS data.
SELECT * FROM oss_test_tb;The result is as follows.
+--------------+---------------------+---------------+----------------+----------------------+ | id | user_name | age | city | access_url | +--------------+---------------------+---------------+----------------+----------------------+ | 1 | tick | 32 | shanghai | http://example.com | +--------------+---------------------+---------------+----------------+----------------------+ Returned rows: [1], Time elapsed: [183ms]
Method 2: Use a table function
Enterprise Edition or Community-compatible Edition v22.8 or later
Log on to the database.
Log on to the ApsaraDB for ClickHouse console.
On the Clusters page, select the Clusters of Community-compatible Edition tab or the Enterprise Edition Clusters tab, and then click the ID of your cluster.
On the Cluster Information page, click Log On to Database in the upper-right corner.
In the Log On to Instance dialog box, enter the database account and password, and click Log On.
Use a table function to query data.
Syntax:
SELECT * FROM oss('https://<bucket-name>.<oss-endpoint>/<file-name>','<access-key-id>', '<access-key-secret>', '<file-format-name>', '<col_name> <col_type>(,...)');The parameters are the same as those for creating an OSS external table. For more information, see Parameters.
Example:
SELECT * FROM oss('http://testBucketName.oss-cn-shanghai-internal.aliyuncs.com/test.csv', 'LTAI****************', 'yourAccessKeySecret', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String')The result is as follows.
+--------------+---------------------+---------------+----------------+----------------------+ | id | user_name | age | city | access_url | +--------------+---------------------+---------------+----------------+----------------------+ | 1 | tick | 32 | shanghai | http://example.com | +--------------+---------------------+---------------+----------------+----------------------+ Returned rows: [1], Time elapsed: [183ms]
Community-compatible Edition v21.8 or earlier
Log on to the database.
Log on to the ApsaraDB for ClickHouse console.
On the Clusters page, click the Clusters of Community-compatible Edition tab, and then click the ID of your cluster.
On the Cluster Information page, click Log On to Database in the upper-right corner.
In the Log On to Instance dialog box, enter the database account and password, and click Log On.
Use a table function to query data.
Syntax:
SELECT * FROM oss('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>', '<col_name> <col_type>(,...)');The parameters are the same as those for creating an OSS external table. For more information, see Parameters.
Example:
SELECT * FROM oss('oss-cn-hangzhou-internal.aliyuncs.com', 'LTAI****************', 'yourAccessKeySecret', 'oss://ck-test-oss/test.csv', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String');The result is as follows.
+--------------+---------------------+---------------+----------------+----------------------+ | id | user_name | age | city | access_url | +--------------+---------------------+---------------+----------------+----------------------+ | 1 | tick | 32 | shanghai | http://example.com | +--------------+---------------------+---------------+----------------+----------------------+ Returned rows: [1], Time elapsed: [183ms]
Export ClickHouse data or write data to OSS
Usage notes
Enterprise Edition or Community-compatible Edition v22.8 or later
When you export data from ClickHouse to OSS or write data directly to OSS, you must set either `s3_truncate_on_insert=1` or `s3_create_new_file_on_insert=1`. If you do not set these parameters, or if you set them to 0, the operation fails if an object already exists at the destination path.
s3_truncate_on_insert=1: ApsaraDB for ClickHouse checks if an object exists at the destination path before exporting or writing data.
If the file already exists, ClickHouse will overwrite the file with new data.
This is suitable for full overwrite scenarios. Use this with caution, because this operation deletes the original data.
s3_create_new_file_on_insert=1: ClickHouse checks if an object exists at the destination path before exporting or writing data.
If a file with the specified name already exists, ClickHouse creates a new file in the same folder and writes the data to it. The new file is named using the following format:
object file name + an incremental number that starts from 0 + file format name.For example, if the destination object specified when you create the OSS external table is test.csv, the new object is named test0.csv after the first operation. After the second operation, the new object is named test1.csv.
This method is suitable for incremental export or write scenarios and avoids overwriting existing data.
Community-compatible Edition v21.8 or earlier
When you export data from ClickHouse to OSS or write data directly to OSS, only one OSS object is used. The exported or written data overwrites the content of the original object. Use this with caution.
Step 1: Log on to the database
Log on to the ApsaraDB for ClickHouse console.
On the Clusters page, select the Clusters of Community-compatible Edition tab or the Enterprise Edition Clusters tab, and then click the ID of your cluster.
On the Cluster Information page, click Log On to Database in the upper-right corner.
In the Log On to Instance dialog box, enter the database account and password, and click Log On.
Step 2: (Optional) Prepare the data to export
If you are exporting data from ClickHouse to OSS and do not have data to export, follow these steps to prepare the data.
If you already have data to export, you can skip this step.
Create a table based on the cluster edition.
The following example creates a table on a single-replica cluster of the Community-compatible Edition. To create a table on a dual-replica cluster of the Enterprise Edition or Community-compatible Edition, see Create a table.
CREATE TABLE test_tb_local ON cluster default ( id UInt8, user_name String, age UInt16, city String, access_url String ) ENGINE = MergeTree() ORDER BY id;Write sample data.
INSERT INTO test_tb_local VALUES (13, 'tick', 25, 'shanghai', 'http://example.com');(Optional) View the data.
You can run the following statement to verify that the data was written successfully.
SELECT * FROM test_tb_local;
Step 3: Create an OSS external table
Enterprise Edition or Community-compatible Edition v22.8 or later
The following is the syntax for creating a table.
CREATE TABLE <table_name> [ON cluster default]
(
'col_name1' col_type1,
'col_name2' col_type2,
...
)
ENGINE = OSS('https://<bucket-name>.<oss-endpoint>/<file-name>', '<access-key-id>', '<access-key-secret>', '<file-format-name>');Parameters:
Parameter name | Description | Example |
table_name | The table name. | oss_test_tb |
col_name1,col_name2 | The column name. | user_name |
col_type1,col_type2 | The column type. Important The structure of the OSS external table must match the OSS data. | String |
bucket-name | The name of the OSS bucket. | ck-test-oss |
oss-endpoint | This endpoint provides private network access to an ECS instance in a VPC. For more information, see Regions and Endpoints. Important Make sure that the OSS bucket and your ApsaraDB for ClickHouse cluster are in the same region. | oss-cn-hangzhou-internal.aliyuncs.com |
file-name | The name of the object in the OSS bucket. | test.csv |
access-key-id | The AccessKey ID of the RAM user that is used to access OSS data. | yourAccessKeyID |
access-key-secret | The AccessKey secret of the RAM user that is used to access OSS data. Use an existing AccessKey or create a new one. For more information, see Create an AccessKey. Important To reduce security threats, the AccessKey secret is shown only when you create it. You cannot retrieve it later. Make sure to store it securely. | yourAccessKeySecret |
file-format-name | File format Important This parameter must be specified exactly as listed in File formats supported by ClickHouse. The names are case-sensitive. | CSV |
Example:
CREATE TABLE oss_test_tb ON cluster default
(
id UInt8,
user_name String,
age UInt16,
city String,
access_url String
)
ENGINE = OSS('http://ck-test-oss.oss-cn-hangzhou-internal.aliyuncs.com/test.csv', 'LTAI****************', 'yourAccessKeySecret','CSV')Community-compatible Edition v21.8 or earlier
The following is the syntax for creating a table.
CREATE TABLE <table_name> [ON cluster default]
(
'col_name1' col_type1,
'col_name2' col_type2,
...
)
ENGINE = OSS('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>');Parameters:
Parameter Name | Description | Example |
table_name | The table name. | oss_test_tb |
col_name1,col_name2 | The column name. | user_name |
col_type1,col_type2 | The column type. Important The structure of the OSS external table must match the OSS data. | String |
oss-endpoint | The Endpoint used to access an ECS instance over the VPC private network. For more information, see Regions and Endpoints. Important Make sure that the OSS bucket and your ApsaraDB for ClickHouse cluster are in the same region. | oss-cn-hangzhou-internal.aliyuncs.com |
access-key-id | The AccessKey ID of the RAM user that is used to access OSS data. | yourAccessKeyID |
access-key-secret | The AccessKey secret of the RAM user that is used to access OSS data. Use an existing AccessKey or create a new one. For more information, see Create an AccessKey. Important To reduce security threats, the AccessKey secret is shown only when you create it. You cannot retrieve it later. Make sure to store it securely. | yourAccessKeySecret |
oss-file-path | The storage path of the object. The format is usually Note The | oss://ck-test-oss/test.csv |
file-format-name | File format Important This parameter must be specified exactly as listed in File formats supported by ClickHouse. The names are case-sensitive. | CSV |
Example:
CREATE TABLE oss_test_tb ON cluster default
(
id UInt8,
user_name String,
age UInt16,
city String,
access_url String
)
ENGINE = OSS('oss-cn-hangzhou-internal.aliyuncs.com', 'LTAI****************', 'yourAccessKeySecret', 'oss://ck-test-oss/test.csv', 'CSV');Step 4: Export or write data to OSS
To export or write data to OSS, use an OSS external table and an INSERT INTO statement. For more information about the syntax, see INSERT INTO. The following examples provide the statements for different scenarios.
Export ClickHouse data
Enterprise Edition or Community-compatible Edition v22.8 or later
Full overwrite export (export data to OSS and overwrite existing data).
INSERT INTO oss_test_tb SETTINGS s3_truncate_on_insert = 1 SELECT * FROM test_tb_local;Incremental export (export data to OSS and do not overwrite existing data).
INSERT INTO oss_test_tb SETTINGS s3_create_new_file_on_insert = 1 SELECT * FROM test_tb_local;After this operation is complete, a new object is created in the OSS bucket. The new object is named as follows:
destination object name + a number starting from 0 and incrementing by 1 + object format name.For example, if the destination object specified when you create the OSS external table is test.csv, the new object is named test0.csv. If you run the operation again, the new object is named test1.csv. You can view the new object in the OSS console.
Community-compatible Edition v21.8 or earlier
Exporting ClickHouse data to OSS overwrites existing data. Use this with caution.
INSERT INTO oss_test_tb SELECT * FROM test_tb_local;Write data
Enterprise Edition or Community-compatible Edition v22.8 or later
Write new data to OSS and overwrite existing data.
INSERT INTO oss_test_tb SETTINGS s3_truncate_on_insert = 1 VALUES (14, 'tick', 25, 'shanghai', 'http://example.com');Write new data to OSS and do not overwrite existing data.
INSERT INTO oss_test_tb SETTINGS s3_create_new_file_on_insert = 1 VALUES (14, 'tick', 25, 'shanghai', 'http://example.com');After this operation is complete, a new object is created in the OSS bucket. The new object is named as follows:
destination object name + a number starting from 0 and incrementing by 1 + object format name.For example, if the destination object specified when you create the OSS external table is test.csv, the new object is named test0.csv. If you run the operation again, the new object is named test1.csv. You can view the new object in the OSS console.
Community-compatible Edition v21.8 or earlier
Writing data to OSS from ClickHouse overwrites existing data. Use this with caution.
INSERT INTO oss_test_tb SETTINGS s3_truncate_on_insert = 1 VALUES (14, 'tick', 25, 'shanghai', 'http://example.com');More operations: Import OSS data to ClickHouse
If you use OSS to stage and migrate source data to ClickHouse, you must also import the data from OSS into the destination ClickHouse cluster. For more information, see Import data from OSS.