Tunnel コマンド - MaxCompute - Alibaba Cloud ドキュメントセンター

特徴

クライアントには、Dship ツールの本来の機能を使用するための Tunnelコマンドが実装されています。

Tunnel コマンドは、主にデータのアップロードとダウンロードに使用します。

Upload: ファイルまたはディレクトリ (階層レベル 1) のアップロードを実行できます。 1 回につき 1 つのテーブルまたはテーブルパーティションにのみデータをアップロードできます。パーティションテーブルの場合は、アップロード先のパーティションを指定する必要があります。

tunnel upload log.txt test_project.test_table/p1="b1",p2="b2";
-- Uploads data in log.txt to the test_project project's test_table table, partitions: p1="b1",p2="b2".
tunnel upload log.txt test_table --scan=only;
-- Uploads data from log.txt to the test_table table.--The scan parameter indicates that the data in log.txt must be scanned to determine if it complies with the test_table definitions.If it does not, the system reports an error and the upload is stopped.

Download: 1 つのファイルにのみデータをダウンロードできます。 1 回につき 1 つのファイルにダウンロードできるのは、1 つのテーブルまたはパーティションのデータだけです。パーティションテーブルの場合は、ソースパーティションを指定する必要があります。
```
tunnel download test_project.test_table/p1="b1",p2="b2" test_table.txt;
-- Download data from the table to the test_table.txt file.
```
Resume: ネットワークや Tunnel サービスでエラーが発生した場合、中断後にファイルやディレクトリの送信を再開できます。このコマンドを使用すると、前のデータアップロード操作は再開できますが、ダウンロード操作は再開できません。
```
tunnel resume;
```

Show: 使用されたコマンド履歴が表示されます。

tunnel show history -n 5;
-- Displays details for the last five data upload/download commands.
tunnel show log;
--Displays the log for the last data upload/download.

Purge: セッションディレクトリがクリアされます。このコマンドを使用すると、3 日前までの履歴がクリアされます。
```
tunnel purge 5;
--Clears logs from the previous five days.
```

Tunnel アップロードとダウンロードに関する制限事項

Tunnel コマンドは、Array、Map、Struct 型データのアップロードとダウンロードには対応していません。

各セッションには、サーバー上で 24 時間のライフサイクルが設けられています。セッションが作成されてから 24 時間以内であれば、使用したりプロセスやスレッド間で共有できます。各セッションのブロック ID は、一意でなくてはいけません。

Tunnel コマンドの使用方法

クライアントで Tunnel コマンドの Help サブコマンドを使用すると、ヘルプ情報が得られます。各コマンドと選択には、短いコマンド形式を使用できます。

odps@ project_name>tunnel help;
    Usage: tunnel <subcommand> [options] [args]
    Type 'tunnel help <subcommand>' for help on a specific subcommand.
Available subcommands:
    upload (u)
    download (d)
    resume (r)
    show (s)
    purge (p)
    help (h)
tunnel is a command for uploading data to / downloading data from MaxCompute.

パラメーター

upload: MaxCompute テーブルにデータをアップロードします。
download: MaxCompute テーブルからデータをダウンロードします。
resume: データのアップロードに失敗した場合、resume コマンドを使用すればアップロードが中断されたところから再開できます。このコマンドはダウンロード操作には使用しないでください。各データアップロードまたはダウンロード操作は、セッションと呼ばれます。 resume コマンドを実行する際、再開するセッションの IDを指定します。
show: 使用されたコマンド履歴を表示します。
purge: セッションディレクトリをクリアします。このコマンドを使用すると、3 日前までの履歴がクリアされます。
help: Tunnel コマンドに関する「ヘルプ」情報が表示されます。

Upload

ローカルファイルのデータは、アペンドモードで MaxCompute にインポートします。サブコマンドは、次のように使用します。

odps@ project_name>tunnel help upload;
usage: tunnel upload [options] <path> <[project.]table[/partition]>
              upload data from local file
 -acp,-auto-create-partition <ARG> auto create target partition if not
                                     exists, default false
 -bs,-block-size <ARG> block size in MiB, default 100
 -c,-charset <ARG> specify file charset, default ignore.
                                     set ignore to download raw data
 -cp,-compress <ARG> compress, default true
 -dbr,-discard-bad-records <ARG> specify discard bad records
                                     action(true|false), default false
 -dfp,-date-format-pattern <ARG> specify date format pattern, default
                                     yyyy-MM-dd HH:mm:ss; 
 -fd,-field-delimiter <ARG> specify field delimiter, support
                                     unicode, eg \u0001. default ","
 -h,-header <ARG> if local file should have table
                                     header, default false
 -mbr,-max-bad-records <ARG> max bad records, default 1000
 -ni,-null-indicator <ARG> specify null indicator string,
                                     default ""(empty string)
 -rd,-record-delimiter <ARG> specify record delimiter, support
                                     unicode, eg \u0001. default "\r\n"
 -s,-scan <ARG> specify scan file
                                     action(true|false|only), default true
 -sd,-session-dir <ARG> set session dir, default
                                     D:\software\odpscmd_public\plugins\ds
                                     hip
 -ss,-strict-schema <ARG> specify strict schema mode. If false,
                                     extra data will be abandoned and
                                     insufficient field will be filled
                                     with null. Default true
 -te,-tunnel_endpoint <ARG> tunnel endpoint
    -threads <ARG> number of threads, default 1
 -tz,-time-zone <ARG> time zone, default local timezone:
                                     Asia/Shanghai
For example:
    tunnel upload log.txt test_project.test_table/p1="b1",p2="b2"

パラメーター

-acp: アップロード先のパーティションが存在しない場合に、自動的にパーティションを作成するかどうかを指定します。デフォルトでは、このパラメーターは無効になっています。
-bs: Tunnelを使用して 1 回にアップロードするデータのブロックサイズを指定します。デフォルト値は100 MiB（MiB = 1024 * 1024B）です。
-c: ローカルデータファイルのエンコーディングを指定します。デフォルト値はI です。このパラメーターを指定しない場合は、ダウンロードのソースデータのエンコーディングがデフォルトで使用されます。
-cp: トラフィック使用量を削減するために、ローカルファイルを圧縮してからアップロードするかどうかを指定します。デフォルトでは、このパラメーターは有効になっています。
-dbr: 不良データ (余分な列、欠落列、列データ型の不一致など) を無視するかどうかを指定します。
- このパラメーター値を「true」にすると、テーブル定義と一致しないデータはすべて無視されます。
- このパラメーター値を「false」にすると、不良データの場合は自動的にエラーメッセージが表示されますが、アップロード先のテーブルの生データは影響を受けません。
-dfp: DateTime型のデータ形式を指定します。デフォルト値は yyyy-MM-dd HH:mm:ss です。ミリ秒レベルの時間形式を指定するには、tunnel upload -dfp 'yyyy-MM-dd HH:mm:ss.SSS' と記述します。詳細は「データ型 (Data types)」をご参照ください。
-fd: ローカルファイルの列の区切り文字を指定します。デフォルト値は、カンマ「,」です。
-h: データファイルにヘッダーを含めるかどうかを指定します。「true」を設定すると、ヘッダーはスキップされ、次の行からアップロードが開始されます。
-mbr: デフォルトでは、不良データが 1000 行を超えると、アップロードは終了します。このパラメーターを使用すると、不良データの許容数を調整できます。
-ni: NULL データ識別子を指定します。デフォルト値は “ “ (空文字列) です。
-rd: ローカルデータの行の区切り文字を指定します。。デフォルト値は、 \r\n です。
-s: ローカルデータファイルをスキャンするかどうかを指定します。デフォルト値は「false.」です。
- 「true」に設定すると、最初にデータがスキャンされて、形式が正しいデータがインポートされます。
- 「false」に設定すると、データはスキャンされずに直接インポートされます。
- パラメーター値が「only」の場合、ローカルデータのスキャンのみが実行されます。スキャン後にデータはインポートされません。
-sd: セッションディレクトリを設定します。
-te: トンネルエンドポイントを指定します。
-threads: スレッド数を指定します。デフォルト値は 1 です。
-tz: タイムゾーンを指定します。デフォルト値は、ローカルタイムゾーンのAsia/Shanghai です。

例

アップロード先のテーブルを作成します。

CREATE TABLE IF NOT EXISTS sale_detail(
      shop_name STRING,
      customer_id STRING,
      total_price DOUBLE)
PARTITIONED BY (sale_date STRING,region STRING);

パーティションを追加します。

alter table sale_detail add partition (sale_date='201312', region='hangzhou');

データファイル data.txt を準備します。内容は次の通りです。
```
shop9,97,100
shop10,10,200
shop11,11
```
このファイルの 3 行目のデータは、sale_detail テーブルの定義に準拠していません。 sale_detail, テーブルには 3 つの列が定義されていますが、3 行目には 2 列しかありません。

データをインポートします。

odps@ project_name>tunnel u d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false
Upload session: 201506101639224880870a002ec60c
Start upload:d:\data.txt
Total bytes:41 Split input to 1 blocks
2015-06-10 16:39:22 upload block: '1'
ERROR: column mismatch -,expected 3 columns, 2 columns found, please check data or delimiter

data.txt には不良データが含まれるため、データのインポートは失敗し、。セッションID とエラーメッセージが表示されます

• データを検証します。

odps@ odpstest_ay52c_ay52> select * from sale_detail where sale_date='201312';
ID = 20150610084135370gyvc61z5
+-----------+-------------+-------------+-----------+--------+
| shop_name | customer_id | total_price | sale_date | region |
+-----------+-------------+-------------+-----------+--------+
+-----------+-------------+-------------+-----------+--------+

不良データが存在するため、データのインポートに失敗し、テーブルにデータはありません。

Show

履歴レコードを表示します。サブコマンドは、次のように使用します。

odps@ project_name>tunnel help show;
usage: tunnel show history [options]
              show session information
 -n,-number <ARG> lines
For example:
    tunnel show history -n 5
    tunnel show log

パラメーター

-n: 表示する行数を指定します。

例

odps@ project_name>tunnel show history;
201506101639224880870a002ec60c failed 'u --config-file /D:/console/conf/odps_config.ini --project odpstest_ay52c_ay52 --endpoint http://service.odps.aliyun.com/api --id UlVxOHuthHV1QrI1 --key 2m4r3WvTZbsNJjybVXj0InVke7UkvR d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false'

注上記例では、 201506101639224880870a002ec60c は、前のセクションでインポートに失敗したデータのセッションです。

Resume

履歴レコードを修復し、再開します（データのアップロードにのみ有効）。サブコマンドは、以下のように使用します。

odps@ project_name>tunnel help resume;
usage: tunnel resume [session_id] [-force]
              resume an upload session
 -f,-force force resume
For example:
    tunnel resume

例

data.txt ファイルを次のように変更します。

shop9,97,100
shop10,10,200

修正したデータを再度アップロードします。

odps@ project_name>tunnel resume 201506101639224880870a002ec60c --force;
start resume
201506101639224880870a002ec60c
Upload session: 201506101639224880870a002ec60c
Start upload:d:\data.txt
Resume 1 blocks 
2015-06-10 16:46:42 upload block: '1'
2015-06-10 16:46:42 upload block complete, blockid=1
upload complete, average speed is 0 KB/s
OK

注上記例では、201506101639224880870a002ec60c がセッション ID です。

データの検証:

odps@ project_name>select * from sale_detail where sale_date='201312';
 ID = 20150610084801405g0a741z5
 +-----------+-------------+-------------+-----------+--------+
 | shop_name | customer_id | total_price | sale_date | region |
 +-----------+-------------+-------------+-----------+--------+
 | shop9 | 97 | 100.0 | 201312 | hangzhou |
 | shop10 | 10 | 200.0 | 201312 | hangzhou |
 +-----------+-------------+-------------+-----------+--------+

Download

サブコマンドは、次のように使用します。

odps@ project_name>tunnel help download;
usage: tunnel download [options] <[project.]table[/partition]> <path>
              download data to local file
 -c,-charset <ARG> specify file charset, default ignore.
                                   set ignore to download raw data
 -ci,-columns-index <ARG> specify the columns index(starts from
                                   0) to download, use comma to split each
                                   index
 -cn,-columns-name <ARG> specify the columns name to download,
                                   use comma to split each name
 -cp,-compress <ARG> compress, default true
 -dfp,-date-format-pattern <ARG> specify date format pattern, default
                                   yyyy-MM-dd HH:mm:ss
 -e,-exponential <ARG> When download double values, use
                                   exponential express if necessary.
                                   Otherwise at most 20 digits will be
                                   reserved. Default false
 -fd,-field-delimiter <ARG> specify field delimiter, support
                                   unicode, eg \u0001. default ","
 -h,-header <ARG> if local file should have table header,
                                   default false
    -limit <ARG> specify the number of records to
                                   download
 -ni,-null-indicator <ARG> specify null indicator string, default
                                   ""(empty string)
 -rd,-record-delimiter <ARG> specify record delimiter, support
                                   unicode, eg \u0001. default "\r\n"
 -sd,-session-dir <ARG> set session dir, default
                                   D:\software\odpscmd_public\plugins\dshi
                                   p
 -te,-tunnel_endpoint <ARG> tunnel endpoint
    -threads <ARG> number of threads, default 1
 -tz,-time-zone <ARG> time zone, default local timezone:
                                   Asia/Shanghai
usage: tunnel download [options] instance://<[project/]instance_id> <path>
              download instance result to local file
 -c,-charset <ARG> specify file charset, default ignore.
                                   set ignore to download raw data
 -ci,-columns-index <ARG> specify the columns index(starts from
                                   0) to download, use comma to split each
                                   index
 -cn,-columns-name <ARG> specify the columns name to download,
                                   use comma to split each name
 -cp,-compress <ARG> compress, default true
 -dfp,-date-format-pattern <ARG> specify date format pattern, default
                                   yyyy-MM-dd HH:mm:ss
 -e,-exponential <ARG> When download double values, use
                                   exponential express if necessary.
                                   Otherwise at most 20 digits will be
                                   reserved. Default false
 -fd,-field-delimiter <ARG> specify field delimiter, support
                                   unicode, eg \u0001. default ","
 -h,-header <ARG> if local file should have table header,
                                   default false
    -limit <ARG> specify the number of records to
                                   download
 -ni,-null-indicator <ARG> specify null indicator string, default
                                   ""(empty string)
 -rd,-record-delimiter <ARG> specify record delimiter, support
                                   unicode, eg \u0001. default "\r\n"
 -sd,-session-dir <ARG> set session dir, default
                                   D:\software\odpscmd_public\plugins\dshi
                                   p
 -te,-tunnel_endpoint <ARG> tunnel endpoint
    -threads <ARG> number of threads, default 1
 -tz,-time-zone <ARG> time zone, default local timezone:
                                   Asia/Shanghai
For example:
    tunnel download test_project.test_table/p1="b1",p2="b2" log.txt
    tunnel download instance://test_project/test_instance log.txt

パラメーター

-c: ローカルデータファイルのエンコーディングを指定します。デフォルト値は、「Ignore」です。
-ci: ダウンロードする列のインデックス (0 から開始) を指定します。複数入力するときは、カンマ (,) 区切ります。
-cn: ダウンロードする列名を指定します。複数入力するときは、カンマ (,) で区切ります。
-cp, -compress: トラフィック使用量を削減するために、データを圧縮してからダウンロードするかどうかを指定します。デフォルトでは、このパラメーターは有効になっています。
-dfp: DateTime型のデータ形式で､デフォルト値は yyyy-MM-dd HH:mm:ss です。
-e: Double 型のデータをダウンロードする際、このパラメーターを使用して、値を指数関数として表記してください。表記しない場合、最大 20 桁を保持できます。
-fd: ローカルデータファイルの列区切り文字を指定します。デフォルト値は、カンマ「,」です。
-h: データファイルにヘッダーが含まれるかどうかを指定します。「true」を設定すると、ヘッダーはスキップされ、次の行からダウンロードが開始されます。

注 -h=true と threads>1 は、一緒に使用できません。
-limit: ダウンロードするファイル数を指定します。
-ni: NULL データ識別子を指定します。デフォルト値は “ “ (空文字列) です。
-rd: ローカルデータの行の区切り文字を指定します。デフォルト値は、\r\n です。
-sd: セッションディレクトリを設定します。
-te: トンネルエンドポイントを指定します。
-threads: スレッド数を指定します。デフォルト値は 1 です。
-tz: タイムゾーンを指定します。デフォルト値は、ローカルタイムゾーンの Asia/Shanghai です。

例

データを result.txt にダウンロードします。

$ ./tunnel download sale_detail/sale_date=201312,region=hangzhou result.txt;
    Download session: 201506101658245283870a002ed0b9
    Total records: 2
    2015-06-10 16:58:24 download records: 2
    2015-06-10 16:58:24 file size: 30 bytes
    OK

result.txt の内容を確認します。

shop9,97,100.0
shop10,10,200.0

Purge

セッションディレクトリをクリアします。デフォルトでは、3 日前までのセッションがクリアされます。サブコマンドは、次のように使用します。

odps@ project_name>tunnel help purge;
usage: tunnel purge [n]
              force session history to be purged.([n] days before, default
              3 days)
For example:
    tunnel purge 5

データ型:


データ型	要件
STRING	文字列型です。長さは8 MBを超えることはできません。
BOOLEN	アップロードの場合、「true」、「false」、「0」、「1」のみ使用できます。ダウンロードの場合、「true」または「false」のみ使用できます。大文字と小文字は区別されません。
BIGINT	BIGINT 値の範囲: [-9223372036854775807 ～ 9223372036854775807]
DOUBLE	16 ビット幅アップロードは指数表記に対応ダウンロードは、数値表現にのみ対応最大値: 1.7976931348623157E308 最小値: 4.9E-324 正の無限大: Infinity 負の無限大: -Infinity
DATETIME	デフォルトで、Datetime 型のデータは、アップロードの際にUTC+8 に対応しています。コマンドを使用すると、データ内の日付形式のパターンを指定できます。

DATETIME 型のデータをアップロードする場合、日時形式を指定してください。特定の形式に関する詳細は、「SimpleDateFormat」をご参照ください。

"yyyyMMddHHmmss": data format "20140209101000"
"yyyy-MM-dd HH:mm:ss" (default): data format "2014-02-09 10:10:00"
"MM/dd/yyyy": data format "09/01/2014"

例

tunnel upload log.txt test_table -dfp "yyyy-MM-dd HH:mm:ss"

Null: すべてのデータ型は、Nullにできます。

デフォルトで、空文字列の既定値は Null です。
Null 文字列を指定するには、コマンドラインで -null-indicator を使用します。

tunnel upload log.txt test_table -ni "NULL"

Character encoding: ファイルの文字エンコーディングを指定できます。。デフォルト値は UTF-8 です。

tunnel upload log.txt test_table -c "gbk"

Delimiter: Tunnel コマンドは、カスタムファイルの区切り文字に対応しています。行区切り文字は、「-record-delimiter」で、列区切り文字は「-field-delimiter」です。

説明

列と行の区切り文字では、複数の文字を使用できます。
列の区切り文字に行の区切り文字を含めることはできません。
コマンドラインでは、エスケープ文字の区切り文字として \r、\n、\t のみ使用できます。

例

tunnel upload log.txt test_table -fd "||" -rd "\r\n"