通过LOAD DATA将本地数据导入AnalyticDB for MySQL数仓版 - AnalyticDB

This topic describes how to use the LOAD DATA statement to import on-premises data to AnalyticDB for MySQL Data Warehouse Edition (V3.0).

Syntax

LOAD DATA LOCAL 
    INFILE 'file_name' 
    [REPLACE | IGNORE] 
    INTO TABLE table_name 
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string'] 
        [[OPTIONALLY] ENCLOSED BY 'char'] 
    ]
    [LINES
        [TERMINATED BY 'string'] 
    ]
    [IGNORE number {LINES | ROWS}] 
    [(column_name_or_user_var
    [, column_name_or_user_var] ...)]

Parameters

Parameter	Description
`LOAD DATA LOCAL INFILE`	Imports data from an on-premises file. All text files are supported. Note AnalyticDB for MySQL clusters of V3.1.10.2 or later and V3.2.0 or later allow you to import data from CSV files. By default, CSV files comply with the RFC 4180 standard. You can specify column delimiters, but not row delimiters or column terminators, for CSV files. To disable the RFC 4180 standard for CSV files, you can execute the `SET adb_config LOAD_CSV_DATA_WITH_OPENCSV_FORMAT=false` statement.
`file_name`	The path of the on-premises file. The path consists of a directory and a file name. Note If the path specified by the `file_name` parameter is a relative path, the path of the on-premises file is relative to the path of the client.
`table_name`	The name of the AnalyticDB for MySQL table.
`REPLACE`	Overwrites existing data with the data that you want to import when a duplicate primary key is used.
`IGNORE`	Ignores rows that failed to be imported when a duplicate primary key is used or a data error occurs. Specific rows may fail to be imported.
`[FIELDS] TERMINATED BY 'string'`	The delimiter that is used to separate columns of data. The default value is `\t`, which is the same as the column delimiter in MySQL databases.
`[FIELDS] ENCLOSED BY 'char'`	The character that is used to enclose the data of each column. For example, if the data of a column is `"a"` and you specify the `ENCLOSED BY ""` clause, the double quotation marks (`""`) are removed from `"a"` before the data is imported.
`[LINES] TERMINATED BY 'string'`	The delimiter that is used to separate rows of data. The default value is `\n`.
`IGNORE number LINES`	The number of rows from the beginning that you want to ignore when you import the data. For example, `IGNORE 1 LINES` specifies that the first row of data is ignored and not imported to the AnalyticDB for MySQL table.
`(column_name_or_user_var,...)`	The columns that you want to import. If you do not specify this parameter, data is imported based on the order of columns. If the number of the columns that you want to import is greater than the number of columns in the AnalyticDB for MySQL table, the system ignores excess columns. If the number of the columns that you want to import is less than the number of columns in the AnalyticDB for MySQL table, the columns that are not filled with the imported data are filled with default values.

Usage notes

You must enable the local-infile feature for a client.
For example, you must add the following configuration to the my.cnf file to enable the local-infile feature for a MySQL client:
```
cat ~/.my.cnf
[mysqld]
local-infile
[mysql]
local-infile                    
```
For more information about the my.cnf file, see MySQL official documentation.
Data import cannot ensure the atomicity of operations.
- In IGNORE mode, the rows of data that failed to be imported are ignored.
- In REPLACE mode, a subsequent INSERT operation is terminated if a row of data fails to be imported. As a result, specific rows of data may not be imported.
You can execute the SHOW WARNINGS statement to query the error messages of failed rows.

Example

In this example, the data of an on-premises file named out.bak is imported to an AnalyticDB for MySQL table named test. The out.bak file contains 5,000 rows of data. The column delimiter is \t, and the row delimiter is \n. The eighth row of data is invalid, as shown in the following data entries:

1 bb
2 bb
3 bb
4 bb
5 bb
6 bb
7 bb
bb   8
9 bb
...

Connect to the AnalyticDB for MySQL cluster that you want to manage. Execute the CREATE DATABASE and CREATE TABLE statements to create a table named test in the database named adb_demo to store the data that you want to import from the on-premises file.
```
CREATE TABLE test (
a int NOT NULL DEFAULT '0', 
b varchar NOT NULL, 
PRIMARY KEY (a) 
) DISTRIBUTED by HASH(a);                
```

Execute the LOAD DATA statement on a MySQL client to import data from the on-premises file named out.bak to the AnalyticDB for MySQL table named test.

Execute the LOAD DATA statement in IGNORE mode. In this mode, specific rows may fail to be imported. Sample statement:

LOAD DATA LOCAL INFILE '~/out.bak' IGNORE INTO TABLE test FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' ;

Execute the following statement to query the import result:

SELECT COUNT(1) FROM test;

+----------+
| count(1) |
+----------+
|     4999 |
+----------+
1 row in set (0.14 sec)

Execute the LOAD DATA statement in REPLACE mode. In this mode, a subsequent import operation is immediately terminated if a row fails to be imported. Sample statement:

LOAD DATA LOCAL INFILE '~/out.bak' REPLACE INTO TABLE test FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' ;

Sample result:

ERROR 1064 (42000): [13000, 2019061210070703000511314203303000266] syntax error :syntax error => IDENTIFIER is not value type pos:34 row: 0 and ceil:0

Execute the LOAD DATA statement to skip the first 10 rows of data when you import the data. Sample statement:

LOAD DATA LOCAL INFILE '~/out.bak' REPLACE INTO TABLE test FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 10 LINES;

Sample result:

Query OK, 4990 rows affected (0.37 sec)