You can use table engines or table functions to import data from Object Storage Service (OSS) to ApsaraDB for ClickHouse for log query, analysis, and processing. This topic describes how to import data from OSS to ApsaraDB for ClickHouse.
Prerequisites
OSS is activated. For more information about how to activate OSS, see the Activate OSS section of the Get started by using the OSS console topic.
A bucket is created. The bucket and the ApsaraDB for ClickHouse cluster are deployed in the same region. For more information about how to create a bucket, see the Create a bucket section of the Get started by using the OSS console topic.
The Alibaba Cloud account that is used to access the OSS bucket is granted the read permissions on the objects in the bucket. For more information about how to grant permissions, see Overview.
Data preparations
Store the following test data in a CSV file named test.csv and upload the file to OSS. When you import data, the default column delimiter supported by the ApsaraDB for ClickHouse cluster is a comma (,)
. For more information about how to upload data, see the Upload an object section of the Get started by using the OSS console topic.
1,yang,32,shanghai,http://example1.com
2,wang,22,beijing,http://example2.com
3,xiao,23,shenzhen,http://example3.com
4,jess,45,hangzhou,http://example4.com
5,jack,14,shanghai,http://example5.com
6,tomy,25,hangzhou,http://example6.com
7,lucy,45,shanghai,http://example7.com
8,tengyin,26,shanghai,http://example8.com
9,wangli,27,shenzhen,http://example9.com
10,xiaohua,37,shanghai,http://example10.com
Procedure
Connect to the ApsaraDB for ClickHouse cluster. For more information, see Connect to an ApsaraDB for ClickHouse cluster.
Create a local table named
oss_test_tbl_local
.ImportantThe ApsaraDB for ClickHouse table must use the same schema as the OSS external table and be compatible with the data formats in the OSS external table (especially in case of null fields). This prevents cluster issues caused by data parsing failures.
You must execute a statement based on the edition of the ApsaraDB for ClickHouse cluster to create a local table. You can click Cluster Information in the left-side navigation pane and view the edition of the cluster in the Cluster Properties section.
You can select a table engine based on your business requirements. For more information about table engines, see Table engines.
Statement used to create a local table in a cluster of Single-replica Edition
CREATE TABLE oss_test_tbl_local ON CLUSTER default ( id UInt8, user_name String, age UInt16, city String, access_url String ) ENGINE = MergeTree() ORDER BY id;
Statement used to create a local table in a cluster of Double-replica Edition
CREATE TABLE oss_test_tbl_local ON CLUSTER default ( id UInt8, user_name String, age UInt16, city String, access_url String ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{database}/{table}/{shard}', '{replica}') ORDER BY id;
(Optional) Create a distributed table named
oss_test_tbl_distributed
.NoteIf you want to distribute data to the local tables in a collection, you must create a distributed table.
CREATE TABLE oss_test_tbl_distributed ON CLUSTER default ( id UInt8, user_name String, age UInt16, city String, access_url String ) ENGINE = Distributed(default, default, oss_test_tbl_local, rand());
Import data from OSS to ApsaraDB for ClickHouse.
You can use a table engine or a table function to import data from OSS to ApsaraDB for ClickHouse.
ImportantYou must execute a statement based on the kernel version of the cluster to create an OSS external table. You can click Cluster Information in the left-side navigation pane and view the kernel version of the cluster in the Cluster Properties section.
Method 1: Use a table engine to import data from OSS
Method 2: Use a table function to import data from OSS
Query data in the distributed table
oss_test_tbl_distributed
and check whether the data in OSS is imported to ApsaraDB for ClickHouse.SELECT * FROM oss_test_tbl_distributed;
The following result is returned:
┌─id─┬─user_name─┬──age──┬───city─────┬─────access_url────────┐ │ 1 │ yang │ 32 │ shanghai │ http://example1.com │ │ 2 │ wang │ 22 │ beijing │ http://example2.com │ │ 3 │ xiao │ 23 │ shenzhen │ http://example3.com │ │ 4 │ jess │ 45 │ hangzhou │ http://example4.com │ │ 5 │ jack │ 14 │ shanghai │ http://example5.com │ │ 6 │ tomy │ 25 │ hangzhou │ http://example6.com │ │ 7 │ lucy │ 45 │ shanghai │ http://example7.com │ │ 8 │ tengyin │ 26 │ shanghai │ http://example8.com │ │ 9 │ wangli │ 27 │ shenzhen │ http://example9.com │ │ 10 │ xiaohua │ 37 │ shanghai │ http://example10.com │ └────┴───────────┴───────┴────────────┴───────────────────────┘
Use wildcard characters to perform fuzzy matching for storage paths in OSS
In most cases, multiple small files in OSS are created based on the same naming conventions. To simplify the analysis of small files, you can use the following wildcard characters when you set the oss-file-path
parameter to perform fuzzy matching:
*
: matches all files or directories. For example,/dir/*
matches all files in the/dir
directory.{x, y, z}
: matches one of the values that are enclosed in the brace {}. For example,file_{x, y, z}
matchesfile_x
,file_y
, orfile_z
.{num1..num2}
: matches the smallest value, the largest value, or a value between them. For example,file_{1..3}
matchesfile_1
,file_2
, orfile_3
.?
: matches a random character. For example,file_?
matchesfile_a
,file_b
,file_c
, or other files whose names are similar.
Example
Files can be uploaded to a directory that uses the following structure:
oss://testBucketName/
doc-data/
oss-import/
small_files/
access_log_csv_1.txt
access_log_csv_2.txt
access_log_csv_3.txt
The following list provides the values of the oss-file-path
parameter only for reference:
oss://testBucketName/doc-data/oss-import/small_files/*
oss://testBucketName/doc-data/oss-import/small_files/access*
oss://testBucketName/doc-data/oss-import/small_files/access_log_csv_{1,2,3}.txt
oss://testBucketName/doc-data/oss-import/*/access_log_csv_{1,2,3}.txt
oss://testBucketName/doc-data/oss-import/*/*
oss://testBucketName/doc-data/oss-import/*/access_log_csv_{1..3}.txt
oss://testBucketName/doc-data/oss-import/*/access_log_csv_?.txt