You can export local tables to Object Storage Service (OSS) in parallel, which significantly improves the efficiency of data export. This topic describes how to export local tables to OSS in parallel.
Limits
For PolarDB for MySQ clusters, the product edition must be Enterprise Edition, and the version must meet the following requirements:
Version 8.0.1:Revision number must be 8.0.1.1.38 or higher.
Version 8.0.2:Revision number must be 8.0.2.2.25 or higher.
Only the results of single table queries can be exported in parallel. The results of multi-table queries and complex queries such as Join, Order By, and Group By cannot be exported in parallel.
We recommend that you perform export tasks on a read-only node. You can add the
/*FORCE_SALVE*/
hint to a SQL statement or directly connect to a read-only node for data export.
Parameters
Parameter | Level | Description |
loose_oss_outfile_buffer_size | Global | The amount of memory that can be used by a single oss outfile thread. In most cases, the export speed increases if a thread can use more memory. Valid values: 102400 to 536870912. Default value: 134217728. Unit: bytes. |
Usage
Syntax
SELECT * FROM table_name INTO OSSOUTFILE 'outfile_path' COLUMNS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' NULL_MARKER BY 'NULL' LINES TERMINATED BY '\n';
COLUMNS TERMINATED BY
: the delimiter between fields.OPTIONALLY ENCLOSED BY
: the identifiers at both ends of a field. After OPTIONALLY is added, identifiers are added only to both ends of a field of the string type. Otherwise, identifiers are added to both ends of a field of all types.NULL_MARKER BY
: the character indicating that the NULL value is exported as text. The output value of NULL can be defined in the following methods:NULL_MARKER: directly defines the output value of NULL.
NULL_MARKER
can be defined as any value. This method has the highest priority.ESCAPED BY: Only a single character can be used. The output value of NULL is the character followed by N. For example,
ESCAPED BY '\'
means that the output value of NULL is \N. This method has the second priority.NULL: If neither of the preceding two methods is defined, the output value of NULL is the string of NULL by default. This method has the lowest priority. NULL does not have
ENCLOSED
identifiers at both sides.
LINES TERMINATED BY: the delimiter between rows.
For more information, see MySQL documentation.
Parameters
Parameter | Description |
outfile_path | The OSS directory for the file. It contains the following parts:
The preceding three parts are separated with forward slashes ( |
table_name | The name of the table for which you want to create the mapping table. |
Example
You can add a hint before a SQL statement to export data in parallel. Alternatively, you can directly execute the SQL statement to export data after the elastic parallel query feature is enabled. For more information about how to enable the elastic parallel query feature, see Overview.
The following example describes how to export local tables to OSS by adding a hint to the SELECT * FROM lineitem;
statement:
Execute the following statement to check whether the current SQL statement can be executed in parallel:
EXPLAIN SELECT /*+parallel(4)*/ * FROM lineitem;
Sample result:
+----+-------------+-----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------+ | 1 | SIMPLE | <gather1> | NULL | ALL | NULL | NULL | NULL | NULL | 5392844 | 100.00 | NULL | | 1 | SIMPLE | lineitem | NULL | ALL | NULL | NULL | NULL | NULL | 1348211 | 100.00 | Parallel scan (4 workers) | +----+-------------+-----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------+ 2 rows in set, 1 warning (0.05 sec)
If the Extra column contains the value "Parallel scan", the current SQL statement can be executed in parallel.
Create the OSS server to which the data is exported. In this example, a server named
outserver
is created.CREATE SERVER outserver FOREIGN DATA WRAPPER oss OPTIONS ( EXTRA_SERVER_INFO '{"oss_endpoint": "oss-cn-zhangjiakou.aliyuncs.com", "oss_bucket": "polardb", "oss_access_key_id": "*******", "oss_access_key_secret": "********", "oss_prefix":"B_outfile"}');
Execute the following statement to view the execution plan of SQL statements whose results are exported in parallel to determine whether local tables can be exported to OSS in parallel:
EXPLAIN SELECT /*+parallel(4)*/ * FROM lineitem INTO OSSOUTFILE 'outserver/t1.CSV' COLUMNS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' NULL_MARKER BY 'NULL' LINES TERMINATED BY '\n';
In this example, the following execution plan is returned:
+----+-------------+-----------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------------------+ | 1 | SIMPLE | <gather1> | NULL | ALL | NULL | NULL | NULL | NULL | 5805759 | 100.00 | Parallel export OSS outfile | | 1 | SIMPLE | lineitem | NULL | ALL | NULL | NULL | NULL | NULL | 1451439 | 100.00 | Parallel scan (4 workers); Parallel export OSS outfile | +----+-------------+-----------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------------------+ 2 rows in set, 1 warning (0.03 sec)
If the Extra column of the execution plan contains the value "
Parallel export OSS outfile
", local tables can be exported to OSS in parallel.If parallel export is not supported in the execution plan, you can use one of the following methods to identify the cause:
Set the
OPTIMIZER_TRACE
parameter to ON.SET optimizer_trace="enabled=on";
Execute the following statement to view the execution plans of SQL statements whose results are exported in parallel:
EXPLAIN SELECT /*+parallel(4)*/ * FROM lineitem WHERE l_orderkey < 100 INTO OSSOUTFILE 'default_oss_server/t1' COLUMNS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' NULL_MARKER BY 'NULL' LINES TERMINATED BY '\n';
Sample result:
+----+-------------+-----------+------------+-------+----------------------+---------+---------+------+------+----------+----------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+-------+----------------------+---------+---------+------+------+----------+----------------------------------------+ | 1 | SIMPLE | <gather1> | NULL | ALL | NULL | NULL | NULL | NULL | 105 | 100.00 | NULL | | 1 | SIMPLE | lineitem | NULL | range | PRIMARY,i_l_orderkey | PRIMARY | 4 | NULL | 26 | 100.00 | Parallel scan (4 workers); Using where | +----+-------------+-----------+------------+-------+----------------------+---------+---------+------+------+----------+----------------------------------------+ 2 rows in set, 1 warning (0.00 sec)
Execute the following statement to view the optimizer trace of SQL statements that are exported in parallel:
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
Sample result:
``` "considered_parallel_outfile": { "choose": false, "storage": "OSS", "format": "CSV", "mean_outfile_record_length": "79 B", "estimate_single_worker_outfile_size": "8 KB", "cause": "The data written by each worker should be greater than 1024 KB (pq_oss_min_worker_write_size)." } } ```
You can view the reason why parallel export is not supported in the
cause
parameter of the execution result.
Execute the following statement to export local tables to OSS in parallel:
SELECT /*+parallel(4)*/ * FROM lineitem INTO OSSOUTFILE 'outserver/t1.CSV' COLUMNS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' NULL_MARKER BY 'NULL' LINES TERMINATED BY '\n';
During data export, you can execute the following statements to view the total amount of memory and the total number of threads occupied by the current export task in real time.
View the total amount of memory occupied by the export task in real time. Unit: bytes.
SHOW STATUS LIKE "%Oss_outfile_memory_used%";
View the total number of threads occupied by the export task in real time.
SHOW STATUS LIKE "%Oss_outfile_threads_running%";
NoteWhen you connect to a cluster endpoint to execute an export task, make sure that the
SHOW STATUS
and export commands are executed on the same node to query information about memory resources occupied by the export command on the current node.