All Products
Search
Document Center

MaxCompute:Usage notes

最終更新日:May 23, 2024

This topic describes the features of Tunnel commands and how to use Tunnel commands to upload or download data.

Supported platforms

MaxCompute allows you to use the MaxCompute client (odpscmd) and MaxCompute Studio to run Tunnel commands. For more information, see MaxCompute client (odpscmd) and What is MaxCompute Studio?

Features

The MaxCompute client provides Tunnel commands that implement the features of the Dship tool. You can use Tunnel commands to upload or download data. This section describes the features of the following Tunnel commands:

  • UPLOAD: uploads local data to a MaxCompute table. You can upload files to only one table or only one partition in a table each time. For a partitioned table, you must specify the partition to which you want to upload data. For a multi-level partitioned table, you must specify a lowest-level partition. For more information, see Upload.

  • DOWNLOAD: downloads MaxCompute table data or the query results of a specific instance to a local directory. You can download data from only one table or partition to a single local file each time. For a partitioned table, you must specify the partition from which you want to download data. For a multi-level partitioned table, you must specify the lowest-level partition. For more information, see Download.

  • RESUME: resumes the upload of files when an error occurs due to network disconnection or faulty Tunnel service. You cannot use this command to resume data downloads. For more information, see Resume.

  • SHOW: displays historical task information. For more information, see Show.

  • PURGE: clears the session directory. Logs from the last three days are deleted by default. For more information, see Purge.

  • HELP: queries help information. Command aliases are supported.

  • UPSERT: uses the UPDATE and INSERT semantics to write data. This command can be used to write data only to Transaction Table 2.0 tables.

    If no matching data is found in the destination table, new data is inserted into the table. If the data to be written already exists, data is updated in the table.

Limits of Tunnel commands

  • Tunnel commands and Tunnel SDK cannot be used for external tables. You can use Tunnel to upload data to MaxCompute internal tables. You can also use OSS SDK for Python to upload data to OSS and map the data to external tables in MaxCompute. For more information about external tables, see Overview of external tables.

  • You cannot run Tunnel commands to upload or download data of the ARRAY, MAP, or STRUCT type.

  • On the server, the lifecycle for a session is 24 hours after a session is created. A session can be shared among processes and threads on the server, but you must make sure that each block ID is unique.

    Note

    Tunnel session: The server creates a session for the upload or download task and generates a unique upload ID or download ID to identify the upload or download task. The lifecycle of a tunnel session on the server is 24 hours. After 24 hours, the session becomes unavailable.

  • If you download data by session, only data of the session created by using your Alibaba Cloud account can be downloaded by this account or its RAM users.

Upload and download table data

Before you upload table data, you must obtain the data file data.txt that contains the table data you want to upload. The save path is d:\data.txt. The following example shows the data in the file.

shopx,x_id,100
shopy,y_id,200
shopz,z_id
Note

The number of columns in the third row of data in the data.txt file is different from the specified number of columns in the partitioned table sale_detail that is created in the following example. The partitioned table sale_detail contains three columns, but the row of data contains only two columns.

To upload and download table data, perform the following steps:

  1. On the MaxCompute client, execute the following statements to create a partitioned table named sale_detail and add partitions to the table.

    CREATE TABLE IF NOT EXISTS sale_detail(
          shop_name     STRING,
          customer_id   STRING,
          total_price   DOUBLE)
    PARTITIONED BY (sale_date STRING,region STRING);
    
    alter table sale_detail add partition (sale_date='201312', region='hangzhou');
  2. Run the UPLOAD command to upload the data.txt file to the sale_detail table.

    • Sample statement

      tunnel upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false;
    • Returned result

      Upload session: 20230505xxxxxxxxxxxb0b02dbb6bd
      Start upload:d:\data.txt
      Using \r\n to split records
      Upload in strict schema mode: true
      Total bytes:42   Split input to 1 blocks
      2023-05-05 10:11:35     upload block: '1'
      ERROR: column mismatch -,expected 3 columns, 2 columns found, please check data or delimiter
      Note

      The data fails to be imported because the data.txt file contains dirty data. The system returns the session ID and an error message.

  3. Execute the following statement to verify the data upload.

    • Sample statement

      select * from sale_detail where sale_date='201312';
    • Returned result

      ID = 20230505xxxxxxxxxxxvc61z5
      +-----------+-------------+-------------+-----------+--------+
      | shop_name | customer_id | total_price | sale_date | region |
      +-----------+-------------+-------------+-----------+--------+
      +-----------+-------------+-------------+-----------+--------+
      Note

      The data fails to be imported because the data.txt file contains dirty data. As a result, the table is empty.

  4. Run the SHOW command to query the ID of the session in which data upload fails.

    • Sample statement

      tunnel show history;
    • Returned result

      20230505xxxxxxxxxxxb0b02dbb6bd  failed  'upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false'
  5. Modify the content in the data.txt file to the following information:

    shopx,x_id,100
    shopy,y_id,200
  6. Run the RESUME command to resume the data upload. 20230505xxxxxxxxxxxb0b02dbb6bd is the ID of the session in which data upload fails.

    • Sample statement

      tunnel resume 20230505xxxxxxxxxxxb0b02dbb6bd --force;
    • Returned result

      start resume
      20230505xxxxxxxxxxxb0b02dbb6bd
      Upload session: 20230505xxxxxxxxxxxb0b02dbb6bd
      Start upload:d:\data.txt
      Using \r\n to split records
      Upload in strict schema mode: true
      Resume 1 blocks 
      2023-05-05 10:32:39     upload block: '1'
      2023-05-05 10:32:40     upload block complete, block id: 1
      upload complete, average speed is 0 bytes/s
      OK
  7. Execute the following statement to verify the data upload:

    • Sample statement

      select * from sale_detail where sale_date='201312';
    • Returned result

      ID = 20230505xxxxxxxxxxxx7afc9qcg
       +-----------+-------------+-------------+-----------+--------+
       | shop_name | customer_id | total_price | sale_date | region |
       +-----------+-------------+-------------+-----------+--------+
       | shopx     | x_id        | 100.0       | 201312    | hangzhou|
       | shopy     | y_id        | 200.0       | 201312    | hangzhou|
       +-----------+-------------+-------------+-----------+--------+
  8. Run the DOWNLOAD command to download data from the sale_detail table to the result.txt file.

    Note

    The local path to which data is downloaded is named based on the following rules:

    • You can directly save the file to the bin directory of the MaxCompute client. In this case, you must set the path parameter to a value in the File name.File name extension format.

    • You can also save the file to another directory. For example, you can save the file to the test folder in drive D. In this case, you must set the path parameter to a value in the D:\test\File name.File name extension format.

    • If a file in the local path has the same name as the file that you want to save, the existing file is overwritten by the new file.

    tunnel download sale_detail/sale_date=201312,region=hangzhou result.txt;
  9. Check whether the result.txt file contains the following content. If the file contains the following content, the data download succeeds.

    Note

    The downloaded file contains only the values of non-partition key columns.

    shopx,x_id,100.0
    shopy,y_id,200.0

Download data of an instance

  • Method 1: Run the Tunnel DOWNLOAD command to download the query results of a specific instance to a local file.

    1. Execute the SELECT statement to query the partitioned table sale_detail.

      select * from sale_detail;
      Note

      If you disable full table scan for the partitioned table sale_detail, the error message Table(xxxx) is full scan with all partitions, please specify partition predicates appears. For more information, see SQL errors (ODPS-01CCCCX).

    2. Run the following command to download the query results to a local file:

      -- View the instance ID in the SELECT statement.
      show p;
      
      -- Run the Tunnel DOWNLOAD command.
      tunnel download instance://20170724071705393ge3csfb8 result.txt;
  • Method 2: Configure the required parameter to download the query results by using InstanceTunnel.

    After you enable use_instance_tunnel on the MaxCompute client, you can use InstanceTunnel to download the query results that are returned by SELECT statements. This helps you download query results of any size at any time. You can use one of the following methods to enable this feature:

    Note

    For more information about the limits of InstanceTunnel, see Limits.

    • Log on to the MaxCompute client of the latest version. Find the odps_config.ini file and make sure that use_instance_tunnel is set to true and instance_tunnel_max_record is set to 10000.

      # download sql results by instance tunnel
      use_instance_tunnel=true
      # the max records when download sql results by instance tunnel
      instance_tunnel_max_record=10000
      Note

      The instance_tunnel_max_record parameter specifies the maximum number of SQL result records that can be downloaded by using InstanceTunnel. If you do not specify this parameter, the number of query results that can be downloaded is unlimited.

    • Set console.sql.result.instancetunnel to true.

      • Enable InstanceTunnel.

        set console.sql.result.instancetunnel=true;
      • Execute the SELECT statement.

        select * from wc_in;

        The following result is returned:

        ID = 20170724081946458g14csfb8
        Log view:
        http://logview/xxxxx.....
        +------------+
        | key        |
        +------------+
        | slkdfj     |
        | hellp      |
        | apple      |
        | tea        |
        | peach      |
        | apple      |
        | tea        |
        | teaa       |
        +------------+
        A total of 8 records fetched by instance tunnel. Max record number: 10000
      Note

      If InstanceTunnel is used, a message appears at the end of the query results. The message indicates the total number of records downloaded after you execute a SELECT statement. In this example, eight records are downloaded. You can set console.sql.result.instancetunnel to false to disable this feature.