This topic provides answers to some frequently asked questions about Tunnel SDK.
Can data be automatically distributed to each partition of a table when I upload the data by using Tunnel SDK?
No, data cannot be automatically distributed to each partition of a table when you upload the data by using Tunnel SDK. Data can be uploaded to a single non-partitioned table or a partition of a partitioned table at a time. To upload data to a partitioned table, you must specify the partition to which you want to upload the data. If the table has multi-level partitions, you must specify a last-level partition.
What is the maximum size of a UDF JAR file when I upload the file by using Tunnel SDK?
The maximum size of a UDF JAR file is 10 MB. If the file size exceeds 10 MB, we recommend that you upload the file by using MaxCompute Tunnel Upload commands. For more information about MaxCompute Tunnel Upload commands, see Tunnel commands.
Is the number of partitions in a table to which you upload data by using Tunnel SDK limited?
Yes, the number of partitions is limited. A maximum of 60,000 partitions are supported. If the number of partitions exceeds 60,000, the efficiency of data collection and analysis is low. MaxCompute has limits on the number of instances in a single job. The number of instances of a job is determined based on the amount of input data and the number of partitions. Therefore, we recommend that you evaluate your business requirements before you select a partitioning policy. This helps minimize the impact of excessive partitions on your business.
What do I do if the error message "StatusConflict" appears when I upload data by using Tunnel SDK?
- Problem description
When data is uploaded by using Tunnel SDK, the following error message appears:
RequestId=20170116xxxxxxx, ErrorCode=StatusConflict, ErrorMessage=You cannot complete the specified operation under the current upload or download status. java.io.IOException: RequestId=20170116xxxxxxx, ErrorCode=StatusConflict, ErrorMessage=You cannot complete the specified operation under the current upload or download status.at com.aliyun.odps.tunnel.io.TunnelRecordWriter.close(TunnelRecordWriter.java:93)
- Cause: This issue occurs due to one of the following causes when you disable a RecordWriter.
- The RecordWriter is disabled.
- The session in which the RecordWriter is used is disabled.
- The session in which the RecordWriter is used is submitted.
- Solution: You can fix the issue based on the preceding causes. For example, to identify the cause, you can view logs or view the status of the RecordWriter and session before you submit the session. After you fix the issue, upload the data again.
What do I do if the error message "Blocks not match" appears when I upload data by using Tunnel SDK?
- Problem description
When data is uploaded by using Tunnel SDK, the following error message appears:
ErrorCode=Local Error, ErrorMessage=Blocks not match, server: 0, tunnelServiceClient: 1 at com.aliyun.odps.tunnel.TableTunnel$UploadSession.commit(TableTunnel.java:814)
- Cause
The number of blocks that are obtained by the server is not the same as the number of blocks that are specified by the blocks parameter in the commit() method.
- Solution
- View the number of RecordWriters that are enabled by calling
uploadSession.openRecordWriter(i)
and the value of the blocks parameter in the commit() method from the code. Make sure that the number of RecordWriters that are enabled is the same as the value of the blocks parameter in the commit() method. - After the code is run, check whether the
recordWriter.close();
method is called before the commit() method is called. If the recordWriter.close(); method is not called before the commit() method is called, the number of blocks that are obtained by the server may not be the same as the number of blocks that are specified by the blocks parameter in the commit() method.
- View the number of RecordWriters that are enabled by calling
What do I do if the error message "StatusConflict" appears when the odps tunnel recordWriter.close() method is called to upload 80 million data records at a time?
- Problem description
When the odps tunnel recordWriter.close() method is called to upload 80 million data records at a time, the following error message appears:
ErrorCode=StatusConflict, ErrorMessage=You cannot complete the specified operation under the current upload or download status.
- Cause
The session status is invalid. The session is disabled or submitted.
- Solution
We recommend that you create a session and use the session to upload the data again. To upload data to different partitions, a separate session is required for each partition. If this issue occurs due to repeated submissions of a session, check whether the data is uploaded. If the data upload fails, upload the data again. For more information, see Data upload in multi-threaded mode.
How do I use TunnelBufferedWriter of Tunnel SDK to ensure successful data uploads?
TunnelBufferedWriter is provided in MaxCompute SDK for Java of a version later than 0.21.3-public. TunnelBufferedWriter simplifies data uploads and provides the fault tolerance capability. TunnelBufferedWriter caches data to the buffer of the MaxCompute client and establishes an HTTP connection to upload data when the buffer is full.
TunnelBufferedWriter provides the maximized fault tolerance capability to ensure successful data uploads. For more information about how to use TunnelBufferedWriter, see TunnelBufferedWriter.
When I use Tunnel SDK to download data, the error message "You need to specify a partitionspec along with the specified table" appears. What do I do?
- Problem description
When data of a partitioned table is downloaded by using Tunnel SDK, the following error message appears:
ErrorCode=MissingPartitionSpec, ErrorMessage=You need to specify a partitionspec along with the specified table.
- Cause
If you use Tunnel SDK to download data from a partitioned table, you must specify the values of partition key columns in the table. Otherwise, an error is returned.
- Solution
- If you use Tunnel commands of the MaxCompute client to download data from a partitioned table, the MaxCompute client allows you to download all table data to a folder.
- Before you use Tunnel SDK to download data from a partitioned table, you can use Tunnel
SDK to obtain the information about all partitions of the table. The following command
shows an example.
odps.tables().get(tablename) t.getPartitions()
Which SDKs are supported by MaxCompute Tunnel?
MaxCompute Tunnel supports only the SDK for Java.
Are duplicate block IDs allowed in an upload session?
Each block ID in an upload session must be unique. You can use a block ID to enable a RecordWriter in an upload session, use the RecordWriter to write data, and then call the close method to complete data writing. After data is written, you cannot use the block ID to enable another RecordWriter to write data. The maximum number of blocks is 20,000. Block IDs range from 0 to 19999.
What is the maximum size of a block?
The default size of each block that is uploaded to Tunnel is 100 MiB. The maximum size of a block is 100 GB. We strongly recommend that you set the block size to a value larger than 64 MB. Each block corresponds to a file. A file whose size is smaller than 64 MB is considered a small file. If a large number of small files exist, MaxCompute performance is affected. If you continuously upload a large amount of data, the size of each block can be 64 MB to 256 MB. If you upload data once in a day, the size of each block can be about 1 GB.
You can use TunnelBufferedWriter to upload files and prevent small files from being generated. For more information, see TunnelBufferedWriter.
What do I do if a read or write operation times out or an I/O exception occurs?
When you upload data, a network action is triggered each time a RecordWriter writes 8 KB of data. If no network actions are triggered within 120 seconds, the server closes the connection and the RecordWriter becomes unavailable. In this case, you must enable a new RecordWriter to write data.
To address this issue, we recommend that you use TunnelBufferedWriter.
When you download data, a RecordReader works in a similar way as a RecordWriter. If no network action is triggered for a long period of time, the server closes the connection. We recommend that you use a RecordReader to continuously read data without calling other system interfaces.