All Products
Search
Document Center

Hologres:COPY

Last Updated:Sep 25, 2024

This topic describes how to use COPY statements to import data into or export data from Hologres.

Limits

When you use COPY statements to import or export data, take note of the following items:

  • To use COPY statements to import data from a partitioned table to Hologres, you can import the data only to child partitioned tables rather than parent partitioned tables.

  • In Hologres V1.1.43 and later, the COPY FROM STDIN statement supports tables that are created with the DEFAULT constraint and tables that contain columns of the SERIAL data type. The COPY FROM STDIN statement does not support these tables in earlier Hologres versions.

Introduction

You can use the COPY FROM statement to import data from the standard input of a client to Hologres and the COPY TO statement to export Hologres data.

Note
  • The COPY statements support the data types supported by Hologres. For more information, see Data types.

  • To import data to Hologres, you must execute the COPY FROM STDIN statement. To export data from Hologres, you must execute the COPY ( query ) TO STDOUT statement.

  • The COPY statements ensure the atomicity of data import and export operations. In fixed copy mode, a row-level lock is used instead of a table-level lock, and the atomicity is not ensured. Therefore, in fixed copy mode, if dirty data is generated, an error is reported only for the dirty data record, and the rest data may be partially written or completely fail to be written.

  • In versions earlier than Hologres V3.0, only one record is generated for a COPY operation in the hologres.hg_query_log table of the metadata warehouse. In Hologres V3.0 and later, two records are generated for a COPY operation in the metadata warehouse. One record logs the COPY operation, and the other record logs the INSERT statement that is executed in the COPY operation. The two records can be associated based on the transaction ID. Example:

    SELECT
        query_id,
        query,
        extended_info
    FROM
        hologres.hg_query_log
    WHERE
        extended_info ->> 'source_trx' = '<transaction_id>' -- You can obtain the transaction ID based on the value of the trans_id field in the record of the COPY operation.
    ORDER BY
        query_start
    ;

Syntax

Syntaxes of COPY statements:

COPY table_name [ ( column_name [, ...] ) ]
    FROM STDIN
    [ [ WITH ] ( option [, ...] ) ]

COPY { ( query ) }
    TO STDOUT
    [ [ WITH ] ( option [, ...] ) ]

where option can be one of:

    FORMAT format_name
    DELIMITER 'delimiter_character'
    NULL 'null_string'
    HEADER [ boolean ]
    QUOTE 'quote_character'
    ESCAPE 'escape_character'
    FORCE_QUOTE { ( column_name [, ...] ) | * }
    FORCE_NOT_NULL ( column_name [, ...] )
    ENCODING 'encoding_name'
    STREAM_MODE [ boolean]
    ON_CONFLICT 'none/ignore/update'

Parameters

Parameter

Description

table_name

The name of the Hologres table to which the data is imported.

query

The query statement.

STDIN

Data is imported from stdin of a specified client to Hologres.

STDOUT

Data is exported from Hologres to a specified client.

FORMAT

The file format. Valid values: TEXT, CSV, BINARY.

Default value: TEXT. You can export data of the BINARY data type. You can import data of the BINARY data type only in the fixed copy mode.

DELIMITER

The delimiter that is used to separate columns.

The default delimiter is a tab character in the text format or a comma (,) in the CSV format. Example: DELIMITER AS ','.

NULL

The string that you want to use to represent a null value.

  • TEXT data type: The default value is \N.

  • CSV data type: The default value is an unquoted empty string.

  • BINARY data type: This parameter is not supported.

HEADER

Specifies whether the file to be imported to or exported from Hologres contains a header row. Values in the header row are column names in the file.

Note

This parameter is valid only for data of the CSV data type.

QUOTE

A single-byte character that is used to reference a data value.

Note

This parameter is valid only for data of the CSV data type. Double quotation marks (") are used by default.

ESCAPE

The single-byte character that appears before a character that matches the value of the QUOTE parameter.

Note

This parameter is valid only for data of the CSV data type. By default, the value is the same as the value of the QUOTE parameter.

FORCE_QUOTE

Forcefully uses quotation marks (') for all non-NULL values in a specified column.

Note

This parameter is valid only if the file in the COPY TO statement uses the CSV format.

FORCE_NOT_NULL

In the specified column, the strings that represent NULL values are not matched. Instead, NULL values are read as zero-length strings.

Note

This parameter is valid only if the file in the COPY FROM statement uses the CSV format.

ENCODING

The encoding method that you want to use for the file to be generated. By default, the encoding method of the client is used.

STREAM_MODE

Specifies whether to use the fixed copy mode. Default value: FALSE. For more information, see the FIXED COPY section in Use COPY statements to import data to Hologres. Valid values:

  • TRUE: The fixed copy mode is used. You can use the fixed copy mode only when you import data to Hologres.

  • FALSE: The fixed copy mode is not used.

ON_CONFLICT

The policy that is used if primary keys conflict. This parameter takes effect only if the STREAM_MODE parameter is set to TRUE. The default value is NONE.

  • NONE: An error is reported if primary keys conflict.

  • IGNORE: The operation on the data is skipped if primary keys conflict.

  • UPDATE: The data is updated if primary keys conflict.

Use COPY statements to import data to Hologres

Import local files to Hologres

You can use COPY statements to import local files to Hologres or export data from Hologres to local files. For more information about the principle and usage of COPY statements, see COPY in PostgreSQL documentation.

Examples:

  • Import data from stdin to Hologres. Sample statements:

    -- Create a table in Hologres. 
    CREATE TABLE copy_test (
      id    int,
      age   int,
      name  text
    ) ;
    
    -- Import data to the created Hologres table. 
    COPY copy_test FROM STDIN WITH DELIMITER AS ',' NULL AS '';
    53444,24,wangming
    55444,38,ligang
    55444,38,luyong
    \.
    
    -- Query data in the Hologres table. 
    SELECT * FROM copy_test;
    Note

    If you use a PostgreSQL client, you can import data only from stdin. In the HoloWeb console, you cannot import data from stdin to Hologres by running commands.

  • Import a CSV file from stdin to Hologres. Sample statements:

    -- Create a table in Hologres. 
    CREATE TABLE partsupp ( ps_partkey     integer NOT NULL,
                            ps_suppkey     integer NOT NULL,
                            ps_availqty    integer NOT NULL,
                            ps_supplycost  float  NOT NULL,
                            ps_comment     text NOT NULL );
    
    -- Import a CSV file to the created Hologres table. 
    COPY partsupp FROM STDIN WITH DELIMITER '|' CSV;
    1|2|3325|771.64|final theodolites
    1|25002|8076|993.49|ven ideas
    \.
    
    -- Query data in the Hologres table. 
    SELECT * FROM partsupp;
    Note

    If you use a PostgreSQL client, you can import CSV files from stdin. In the HoloWeb console, you cannot import CSV files from stdin to Hologres by running commands.

  • Import a local file to Hologres. Sample statement:

    psql -U <username> -p <port> -h <endpoint> -d <databasename> -c "COPY <table> FROM STDIN WITH DELIMITER '|' CSV;" <<filename>;
    Note

    If you use a PostgreSQL client, you can import data only from stdin. Therefore, you need to convert the format of the data in the file to be imported into the standard input format. In the HoloWeb console, you cannot import local files from stdin to Hologres by running commands.

    The following table describes the parameters in the preceding syntax.

    Parameter

    Description

    username

    • Alibaba Cloud account: the AccessKey ID of your Alibaba Cloud account. You can obtain the AccessKey ID from the AccessKey Pair page.

      We recommend that you configure environment variables and obtain the AccessKey ID and AccessKey secret from the environment variables. This helps reduce the leak risk.

    • Custom account: the username of the custom account. Example: BASIC$abc.

    port

    The public port number of the Hologres instance.

    Example: 80.

    endpoint

    The public endpoint of the Hologres instance.

    Example: xxx-cn-hangzhou.hologres.aliyuncs.com.

    databasename

    The name of the Hologres database.

    For more information, see Use the PostgreSQL client to connect to Hologres.

    Example: mydb.

    table

    The name of the Hologres table to which you want to import data.

    filename

    The path of the local file that you want to import to Hologres.

    Example: D:\tmp\copy_test.csv.

    The following example describes how to execute a statement to import a local file from stdin to Hologres.

    • Execute the statement to import the on-premises file named copy_test from stdin to Hologres. 执行命令The imported standard file includes the following content:

      01,01,name1
      02,01,name2
      03,01,name3
      04,01,name4
    • After the statement is executed, go back to the PostgreSQL client. Then, you can query newly imported data, as shown in the following figure.查询结果

Use CopyManager to import a file from the JDBC client to Hologres

You can use CopyManager to copy data in files on the Java Database Connectivity (JDBC) client.

CopyManager is an API encapsulated by JDBC for COPY statements of PostgreSQL. Hologres is compatible with the PostgreSQL protocol. You can use CopyManager to import JDBC client files to Hologres or export data from Hologres to a JDBC client file. For more information about CopyManager, see CopyManager in JDBC documentation.

Example: Use CopyManager to import a file from the JDBC client to Hologres. Sample statements:

package com.aliyun.hologram.test.jdbc;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.sql.*;
import java.util.Properties;

import org.postgresql.copy.CopyManager;
import org.postgresql.core.BaseConnection;

public class jdbcCopyFile {

    public static void main(String args[]) throws Exception {
        System.out.println(copyFromFile(getConnection(), "/Users/feng/Workspace/region.tbl", "region"));
    }

    public static Connection getConnection() throws Exception {
        Class.forName("org.postgresql.Driver");
        String url = "jdbc:postgresql://endpoint:port/dbname";
        Properties props = new Properties();
    //set db user
        props.setProperty("user", "AAA");// The AccessKey ID of the current account. We recommend that you configure an environment variable and obtain the AccessKey ID from the environment variable to reduce the data leak risk. 
    //set db password
        props.setProperty("password", "BBB");// The AccessKey secret of the current account. We recommend that you configure an environment variable and obtain the AccessKey secret from the environment variable to reduce the data leak risk. 
        return DriverManager.getConnection(url, props);
    }

    /**
     * Import the file to the Hologres database. 
     *
     * @param connection
     * @param filePath
     * @param tableName
     * @return
     * @throws SQLException
     * @throws IOException
     */
    public static long copyFromFile(Connection connection, String filePath, String tableName)
            throws SQLException, IOException {
        long count = 0;
        FileInputStream fileInputStream = null;

        try {
            CopyManager copyManager = new CopyManager((BaseConnection) connection);
            fileInputStream = new FileInputStream(filePath);
            count = copyManager.copyIn("COPY " + tableName + " FROM STDIN delimiter '|' csv", fileInputStream);
        } finally {
            if (fileInputStream != null) {
                try {
                    fileInputStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
        return count;
    }
}

FIXED COPY

Fixed copy is a new mode that uses fixed plans to optimize the execution of COPY statements. Fixed copy is uniquely developed for Hologres execution engines and supports only data import to Hologres. For more information about the principle of fixed plans, see Accelerate the execution of SQL statements by using fixed plans.

In Hologres V1.3.17 and later, you can import data to Hologres by using the fixed copy mode. The following table compares the characteristics of COPY statements before and after fixed plans are used.

Item

FIXED COPY

COPY

Lock

Row-level lock

Table-level lock

Data latency

Data can be queried immediately after it is written.

Data can be queried after the execution of a COPY statement is complete.

Performance

Good

Excellent

Supported format

TEXT and BINARY

TEXT

Policy to handle primary key conflicts

NONE (an error returned if a conflict occurs), UPDATE, and IGNORE

NONE (an error returned if a conflict occurs)

If the COPY operation is performed only on some columns, the execution of the fixed copy mode varies.

  • A COPY statement is used to write data to some columns, and data is partially updated. Sample code:

    CREATE TABLE t0 (id int NOT NULL, name text, age int, primary key(id));
    
    COPY t0(id, name) FROM STDIN
    WITH (
      STREAM_MODE TRUE,
      ON_CONFLICT UPDATE);
    
    -- The COPY statement is equivalent to the following INSERT INTO statement:
    INSERT INTO t0(id, name) VALUES(?,?)
    ON CONFLICT(id) DO UPDATE SET
    id = excluded.id, name = excluded.name;
  • A COPY statement is used to write data to some columns, and the columns to which no data is written contain the default value. Sample code:

    CREATE TABLE t0 (id int not null, name text, age int DEFAULT 0, primary key(id));
    
    COPY t0(id, name) FROM STDIN
    WITH (
      STREAM_MODE TRUE,
      ON_CONFLICT UPDATE);
    
    -- The COPY statement is equivalent to the following INSERT INTO statement.
    -- For a row of data, if the id column of the row contains no data, the age column of the row is assigned the default value.
    -- If data is available in the id column, the age column is not updated.
    INSERT INTO t0(id, name, age) VALUES(?, ?, default)
    ON CONFLICT(id) DO UPDATE SET
    id = excluded.id, name = excluded.name;

Use the COPY statement to export data from Hologres

Export data from Hologres to a local file

Examples:

  • Execute the \copy statement to export Hologres data to an on-premises file.

    Note

    You can use this method to export Hologres data to local files only on the PostgreSQL client.

    -- Create a table.
    CREATE  TABLE copy_to_local (
      id    int,
      age   int,
      name  text
    ) ;
    
    -- Insert data into the created table.
    INSERT INTO copy_to_local VALUES
    (1,1,'a'),
    (1,2,'b'),
    (1,3,'c'),
    (1,4,'d');
    
    -- Query data in the table.
    SELECT * FROM copy_to_local;
    
    -- Export data to a local file.
    \COPY (SELECT * FROM copy_to_local) TO '/root/localfile.txt';
  • Use stdout to export Hologres data to a local file.

    Note

    You can use this method to export Hologres data to local files only on the PostgreSQL client.

    psql -U <username> -p <port> -h <endpoint> -d <databasename> -c "COPY (SELECT * FROM <tablename>) TO STDOUT WITH DELIMITER '|' CSV;" ><filename>;

Export data to OSS

Alibaba Cloud Object Storage Service (OSS) is a secure, cost-effective, and reliable cloud storage service. You can export data from Hologres to OSS by executing COPY statements.

  • Limits

    • Hologres instances allow only users who are granted the pg_execute_server_program permission and superusers to execute the hg_dump_to_oss statement to export data from Hologres to a specific OSS bucket. Superusers can execute one of the following statements to grant the pg_execute_server_program permission to users:

      -- If the current database uses the simple permission model (SPM), execute the following statement:
      CALL spm_grant('pg_execute_server_program','Alibaba Cloud account ID, Alibaba Mail address, or RAM user account');
      
      -- If the current database uses the standard PostgreSQL authorization model, execute the following statement:
      GRANT pg_execute_server_program TO Alibaba Cloud account ID, Alibaba Cloud email address, or RAM user account;
    • The amount of data that you can export from Hologres to OSS at a time is up to 5 GB.

  • Statement description

    • COPY TO statement:

      COPY ( query ) TO { PROGRAM 'command' | STDOUT }
          [ [ WITH ] ( option [, ...] ) ]

      The PROGRAM parameter specifies a statement that you want to execute. The data to be exported is written to the standard input of the statement. For more information about other parameters, see Parameters.

    • hg_dump_to_oss statement: This statement must be used with the COPY TO statement.

      COPY (query) TO PROGRAM 'hg_dump_to_oss --AccessKeyId <accessid> --AccessKeySecret <accesskey> --Endpoint <ossendpoint> --BucketName <bucketname> --DirName <dirname> --FileName <filename> --BatchSize <xxx> ' (DELIMITER ',', HEADER true, FORMAT CSV);
      Important

      The value of the DirName parameter cannot start with a character such as a forward slash (/) or a backslash (\).

      The following table describes the parameters in the preceding syntax.

      Parameter

      Description

      Sample value

      query

      The query statement.

      select * from dual;

      AccessKeyId

      The AccessKey ID of your Alibaba Cloud account.

      You can obtain the AccessKey ID on the AccessKey Pair page.

      We recommend that you configure environment variables and obtain the AccessKey ID and AccessKey secret from the environment variables to reduce the data leak risk.

      None

      AccessKeySecret

      The AccessKey secret that corresponds to the AccessKey ID.

      We recommend that you configure environment variables and obtain the AccessKey ID and AccessKey secret from the environment variables to reduce the data leak risk.

      None

      Endpoint

      The classic network endpoint of the OSS bucket. You can obtain the endpoint by using one of the following methods:

      • You can view the classic network endpoint on the details page of the bucket. To go to the details page of the bucket, log on to the OSS console and click the bucket name on the Buckets page.

      • You can obtain the endpoint from Regions and OSS endpoints in the public cloud based on the region of OSS.

      oss-cn-beijing-internal.aliyuncs.com

      BucketName

      The name of the OSS bucket.

      dummy_bucket

      DirName

      The OSS directory to which data is exported.

      testdemo/

      FileName

      Optional. The name of the file in the OSS bucket.

      file_name

      BatchSize

      The number of rows that are processed by the hg_dump_to_oss statement at a time. Default value: 1000.

      5000

      DELIMITER

      The delimiter that you want to use to separate fields in the exported file. The default delimiter is a tab character (\t).

      ,

    • Examples

      The following examples show how to execute the hg_dump_to_oss and COPY TO statements in Hologres.

      -- Dump data from a Hologres internal table to a specified OSS bucket.
      COPY (SELECT * FROM holo_test LIMIT 2) TO PROGRAM 'hg_dump_to_oss --AccessKeyId <access id> --AccessKeySecret <access key> --Endpoint oss-cn-hangzhou-internal.aliyuncs.com --BucketName hologres-demo --DirName holotest/ --FileName file_name --BatchSize 3000' DELIMITER ',';
      
      -- Dump data from a Hologres foreign table to a specified OSS bucket.
      COPY (SELECT * FROM foreign_holo_test LIMIT 20) TO PROGRAM 'hg_dump_to_oss --AccessKeyId <access id> --AccessKeySecret <access key> --Endpoint oss-cn-hangzhou-internal.aliyuncs.com --BucketName hologres-demo --DirName holotest/ --FileName file_name --BatchSize 3000' (DELIMITER ',', HEADER true);
      
      -- Dump data from a Hologres table to a specified OSS bucket in another region.
      COPY (SELECT * FROM holo_test_1 LIMIT 20) TO PROGRAM 'hg_dump_to_oss --AccessKeyId <access id> --AccessKeySecret <access key> --Endpoint oss-cn-beijing-internal.aliyuncs.com --BucketName hologres-demo --DirName holotest/ --FileName file_name --BatchSize 3000' (DELIMITER ',', HEADER true, FORMAT CSV);
      Note

      You can export data from Hologres to an OSS bucket in another region. For example, you can export data from a Hologres instance in the China (Hangzhou) region to an OSS bucket in the China (Beijing) region.

    • Common errors and troubleshooting

      The following table describes common errors and solutions.

      Error message

      Solution

      ERROR: syntax error at or near ")"LINE 1: COPY (select 1,2,3 from ) TO PROGRAM 'hg_dump_to_oss2 --Acce...

      This error message is reported because the query statement that is specified in the query parameter is invalid. Check and correct the query statement.

      DETAIL: child process exited with exit code 255

      This error message is reported because the network type that is specified for the OSS bucket is invalid. To access the OSS bucket in the Alibaba Cloud public cloud, use the classic network endpoint of the OSS bucket.

      DETAIL: command not found

      This error message is reported because the program parameter in the DUMP TO OSS statement is not set to hg_dump_to_oss

      ERROR: program "hg_dump_to_oss ..." failed DETAIL: child process exited with exit code 101

      This error message is reported because the specified AccessKeyId parameter is invalid. Use the AccessKey ID of your Alibaba Cloud account.

      ERROR: program "hg_dump_to_oss ..." failed DETAIL: child process exited with exit code 102

      This error message is reported because the specified AccessKeySecret parameter is invalid. Use the AccessKey secret of your Alibaba Cloud account.

      ERROR: program "hg_dump_to_oss ..." failed DETAIL: child process exited with exit code 103

      This error message is reported because the specified Endpoint parameter is invalid. Use the correct classic network endpoint of the OSS bucket.

      ERROR: program "hg_dump_to_oss ..." failed DETAIL: child process exited with exit code 104

      This error message is reported because the specified BucketName parameter is invalid. Use the correct bucket name.

      ERROR: program "hg_dump_to_oss ..." failed DETAIL: child process exited with exit code 105

      This error message is reported because a required parameter is left empty. Check whether all required parameters are correctly specified.

      ERROR: program "hg_dump_to_oss ..." failed DETAIL: child process exited with exit code 255

      This error message is reported because the server on which your Hologres instance resides cannot connect to the network on which the specified OSS bucket resides. Change the specified endpoint of the OSS bucket. For example, you can use the classic network endpoint of the OSS bucket. For more information about endpoints of OSS buckets, see Regions and endpoints.

Use CopyManager to export data from Hologres to a file on the JDBC client

Example: Use CopyManager to export data from Hologres to a file on the JDBC client. Sample code:

import org.postgresql.copy.CopyManager;
import org.postgresql.core.BaseConnection;

import java.io.FileOutputStream;
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Properties;

public class copy_to_local_file {

    public static void main(String args[]) throws Exception {
        System.out.println(copyToFile(getConnection(), "/Users/feng/Workspace/region.tbl", "select * from region"));
    }

    public static Connection getConnection() throws Exception {
        Class.forName("org.postgresql.Driver");
        String url = "jdbc:postgresql://endpoint:port/dbname";
        Properties props = new Properties();
    //set db user
        props.setProperty("user", "AAA");// The AccessKey ID of the current account. We recommend that you configure an environment variable and obtain the AccessKey ID from the environment variable to reduce the data leak risk. 
    //set db password
        props.setProperty("password", "BBB");// The AccessKey secret of the current account. We recommend that you configure an environment variable and obtain the AccessKey secret from the environment variable to reduce the data leak risk. 
        return DriverManager.getConnection(url, props);
    }

    /**
     * Import data into the JDBC client file. 
     *
     * @param connection
     * @param filePath
     * @param SQL_Query
     * @return
     * @throws SQLException
     * @throws IOException
     */
    public static String copyToFile(Connection connection, String filePath, String SQL_Query)
            throws SQLException, IOException {

        FileOutputStream fileOutputStream = null;

        try {
            CopyManager copyManager = new CopyManager((BaseConnection)connection);
            fileOutputStream = new FileOutputStream(filePath);
            copyManager.copyOut("COPY " + "(" + SQL_Query + ")" + " TO STDOUT DELIMITER '|' csv ", fileOutputStream);
        } finally {
            if (fileOutputStream != null) {
                try {
                    fileOutputStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
        return filePath;
    }
}