This topic describes how to use clickhouse-client to import data from a local file to ApsaraDB for ClickHouse. This method can help you quickly import data.
Usage notes
ApsaraDB for ClickHouse allows you to import data from files. The common file formats supported are TabSeparated, TabSeparatedWithNames, TabSeparatedWithNamesAndTypes, CSV, and CSVWithNames. For more information about common file formats, see Common file formats. If your file is in another format, the file must meet the standards of the ClickHouse community. For more information, see Formats for Input and Output Data.
Prerequisites
The IP address of the on-premises machine is added to the whitelist of the ApsaraDB for ClickHouse cluster. For more information, see Configure a whitelist.
clickhouse-client that corresponds to the version of the ApsaraDB for ClickHouse cluster is installed. For more information, see clickhouse-client.
Step 1: Prepare test data
In the directory in which clickhouse-client is installed, create a CSV file named testdata.csv and write the following data to the file:
1,yang,32,shanghai,http://example.com
2,wang,22,beijing,http://example.com
3,xiao,23,shenzhen,http://example.com
4,jess,45,hangzhou,http://example.com
5,jack,14,shanghai,http://example.com
6,tomy,25,hangzhou,http://example.com
7,lucy,45,shanghai,http://example.com
8,tengyin,26,shanghai,http://example.com
9,wangli,27,shenzhen,http://example.com
10,xiaohua,37,shanghai,http://example.com
Step 2: Create an ApsaraDB for ClickHouse table
Log on to the ApsaraDB for ClickHouse console.
In the top navigation bar, select the region where the cluster is deployed.
On the Clusters page, find the cluster that you want to manage and click the cluster ID.
In the upper-right corner of the cluster details page, click Log On to Database.
In the Log on to Database Instance dialog box, enter the username and password of the database account and click Login.
Create a table.
NoteThe schema of an ApsaraDB for ClickHouse table must map to the structure of a local file, and the data type of the ApsaraDB for ClickHouse table must be the same as that of the local file.
Create a local table
If you use a cluster of Single-replica Edition, execute the following statement to create a local table:
CREATE TABLE test_tbl_local ON cluster default ( id UInt8, user_name String, age UInt16, city String, access_url String ) ENGINE = MergeTree() ORDER BY id;
If you use a cluster of Double-replica Edition, execute the following statement to create a local table:
CREATE TABLE test_tbl_local ON cluster default ( id UInt8, user_name String, age UInt16, city String, access_url String ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{database}/{table}/{shard}', '{replica}') order by id;
Create a distributed table.
NoteIf you want to import data only to the local table, skip this step.
CREATE TABLE test_tbl_distributed ON cluster default ( id UInt8, user_name String, age UInt16, city String, access_url String ) ENGINE = Distributed(default, default, test_tbl_local, rand());
Step 3: Import data from the local file
Run the following command in the directory in which clickhouse-client is installed:
cat <Local file name> | ./clickhouse-client --host=<Cluster endpoint> --port=<TCP port number> --user=<Database account username> --password=<Database account password> --query="INSERT INTO <ApsaraDB for ClickHouse table name> FORMAT <Local file format>";
Data import over the Internet is slow. If you want to import a large amount of data or test import performance, we recommend that you use one of the following methods:
Import data over an internal network: Purchase an Elastic Compute Service (ECS) instance that is deployed in the same virtual private cloud (VPC) as the ApsaraDB for ClickHouse cluster, use the VPC endpoint to connect to the ApsaraDB for ClickHouse cluster from the ECS instance, and then import data.
Import data in multiple concurrent processes: Divide the original file into multiple parts. Then, start multiple clients to concurrently import data from these parts.
Step 4: Query the ApsaraDB for ClickHouse table
Connect to the ApsaraDB for ClickHouse cluster by using Data Management (DMS). For more information, see Connect to an ApsaraDB for ClickHouse cluster.
Execute the following query statement:
SELECT * FROM test_tbl_distributed;
NoteIf you import data to a local table, replace the name of the distributed table in the query statement with the name of the local table.
The following result is returned.
Common file formats
The following table describes the common file formats that are supported by ApsaraDB for ClickHouse.
Common file format | Description |
TabSeparated | Data is written to a text file in this format by row. Columns are separated by tabs. Escape sequences are required to represent tabs, line feeds, and backslashes. The escape sequence for a tab is |
TabSeparatedWithNames | This file format is similar to the TabSeparated format, except that the values in the first row are the column names. The first row is ignored during parsing. Therefore, the column names in the first row cannot be used to determine the positions of columns. |
TabSeparatedWithNamesAndTypes | This file format is similar to the TabSeparated format, except that the values in the first row are the column names and the values in the second row are the data types. The first and second rows are ignored during parsing. |
CSV | Data is written to a file of this format by row. Strings are enclosed in a pair of double quotation marks (" "). Each double quotation mark (") that is contained in a string is escaped with a pair of double quotation marks (" "). Numeric values are not enclosed in a pair of double quotation marks (" "). The default column delimiter is a
|
CSVWithNames | This format is similar to the CSV format, except that the values in the first row are the column names. The first row is ignored during parsing. Therefore, the column names in the first row cannot be used to determine the positions of columns. |