All Products
Search
Document Center

Tablestore:Use Tablestore SDKs to deliver Tablestore data to OSS

Last Updated:Oct 21, 2024

Before you use Tablestore SDKs to deliver data, you need to know about the usage notes and operations. You can create a delivery task in the Tablestore console to deliver data from a Tablestore table to an OSS bucket.

Usage notes

  • Data delivery is available in the China (Hangzhou), China (Shanghai), China (Beijing), and China (Zhangjiakou) regions.

  • The delete operation on Tablestore data is ignored when the data is delivered. Tablestore data delivered to OSS is not deleted when you perform a delete operation on the data.

  • It takes at most one minute for initialization when you create a delivery task.

  • Latencies are within 3 minutes when data is written at a steady rate. The P99 latency is within 10 minutes when data is synchronized.

    Note

    The P99 latency indicates the average latency of the slowest 1% of requests over the previous 10 seconds.

Prerequisites

  • The following operations are performed in the Object Storage Service (OSS) console:

    OSS is activated. A bucket is created in the region in which a Tablestore instance resides. For more information, see Activate OSS.

    Note

    Data delivery allows you to deliver data from a Tablestore instance to an OSS bucket that resides in the same region as the Tablestore instance. To deliver data to other warehouses, such as MaxCompute, submit a ticket.

  • The following operations are performed in the Tablestore console:

    • The endpoint of the Tablestore instance is obtained from the Instance Details tab of the Instance Management page. For more information, see Endpoints.

    • A data table is created. For more information, see Operations on a data table.

  • The following operations are performed in the Resource Access Management (RAM) console:

    • A RAM user is created and the AliyunOTSFullAccess policy is attached to the RAM user to grant the RAM user the permissions to manage Tablestore. For more information, see Create a RAM user and Grant permissions to a RAM user.

      Warning

      If the AccessKey pair of your Alibaba Cloud account is leaked, your resources are exposed to potential risks. We recommend that you use the AccessKey pair of a RAM user to perform operations. This prevents the AccessKey pair of your Alibaba Cloud account from being leaked.

      An AccessKey pair is created for the RAM user. For more information, see Create an AccessKey pair.

  • Access credentials are configured. For more information, see Configure access credentials.

Operations

Operation

Description

CreateDeliveryTask

Creates a delivery task.

ListDeliveryTask

Lists information about all delivery tasks that are created for a data table.

DescribeDeliveryTask

Queries the descriptive information about a delivery task.

DeleteDeliveryTask

Deletes a delivery task.

Parameters

Parameter

Description

tableName

The name of the data table.

taskName

The name of the delivery task.

The name must be 3 to 16 characters in length and can contain only lowercase letters, digits, and hyphens (-). The name must start and end with a lowercase letter or digit.

taskConfig

The configurations of the delivery task, which include the following content:

  • ossPrefix: the prefix of the directory in the OSS bucket. The Tablestore data is delivered to the directory. The path of the directory supports the following time variables: $yyyy, $MM, $dd, $HH, and $mm.

    • When the path uses time variables, OSS directories are dynamically generated based on the time when data is written. This way, data is partitioned based on the hive partition naming style. Objects in OSS are organized, partitioned, and distributed based on time.

    • When the path does not use time variables, all files are delivered to an OSS directory whose name contains the specified prefix.

  • ossBucket: the name of the OSS bucket.

  • ossEndpoint: the endpoint of the region in which the OSS bucket is located.

  • ossStsRole: the Alibaba Cloud Resource Name (ARN) of the Tablestore service-linked role.

  • format: the format in which the delivered data is stored. Default value: Parquet.

    By default, PLAIN is used to encode all types of data for delivery.

    Currently, only Parquet is supported. You do not need to specify this parameter.

  • eventTimeColumn: the event time column. This parameter specifies that data is partitioned based on the time of the data in the column. The value of this parameter consists of the name and format (EventTimeFormat) of the column. Valid values of the EventTimeFormat parameter: RFC822, RFC850, RFC1123, RFC3339, and Unix. Specify the format based on your requirements.

    If you do not specify the eventTimeColumn parameter, data is partitioned based on the time at which the data is written to Tablestore.

  • parquetSchema: the field you want to deliver. The value of this parameter consists of the source fields, destination fields, and destination field types. You must specify this parameter.

    You can specify the names of source and destination fields and the order in which you want to deliver the source fields in the schema. After data is delivered to OSS, the data is distributed based on the order of fields in the schema.

    Important

    The types of the source fields must match the types of the destination fields. Otherwise, the fields are discarded as dirty data. For more information, see Data type mapping.

taskType

The type of the delivery task. Default value: BASE_INC. Valid values:

  • INC: incremental data delivery. Only incremental data is synchronized.

  • BASE: full data delivery. All data in the table is scanned and synchronized.

  • BASE_INC: differential data delivery. After full data is synchronized, Tablestore synchronizes incremental data.

    When Tablestore synchronizes incremental data, you can view the time when data is last delivered and the current status of the delivery task.

Use Tablestore SDKs

You can use Tablestore SDK for Java and Tablestore SDK for Go to deliver data to OSS. In this example, Tablestore SDK for Java is used.

The following sample code provides an example on how to create a delivery task for a data table:

import com.alicloud.openservices.tablestore.ClientException;
import com.alicloud.openservices.tablestore.SyncClient;
import com.alicloud.openservices.tablestore.TableStoreException;
import com.alicloud.openservices.tablestore.model.delivery.*;
public class DeliveryTask {

        public static void main(String[] args) {
            final String endPoint = "https://yourinstancename.cn-hangzhou.ots.aliyuncs.com";

            final String accessKeyId = System.getenv("OTS_AK_ENV");
            
            final String accessKeySecret = System.getenv("OTS_SK_ENV");

            final String instanceName = "yourinstancename";

            SyncClient client = new SyncClient(endPoint, accessKeyId, accessKeySecret, instanceName);
            try {
                createDeliveryTask(client);
                System.out.println("end");
            } catch (TableStoreException e) {
                System.err.println("The operation failed. Details:" + e.getMessage() + e.getErrorCode() + e.toString());
                System.err.println("Request ID:" + e.getRequestId());
            } catch (ClientException e) {
                System.err.println("The request failed. Details:" + e.getMessage());
            } finally {
                client.shutdown();
            }
        }

        private static void createDeliveryTask(SyncClient client){
            String tableName = "sampleTable";
            String taskName = "sampledeliverytask";
            OSSTaskConfig taskConfig = new OSSTaskConfig();
            taskConfig.setOssPrefix("sampledeliverytask/year=$yyyy/month=$MM");
            taskConfig.setOssBucket("datadeliverytest");
            taskConfig.setOssEndpoint("oss-cn-hangzhou.aliyuncs.com");
            taskConfig.setOssStsRole("acs:ram::17************45:role/aliyunserviceroleforotsdatadelivery");
            // The eventColumn parameter is optional. If you specify this parameter, data is partitioned based on the time of the data in the column that is specified by this parameter. If you do not specify this parameter, data is partitioned based on the time at which the data is written to Tablestore. 
            EventColumn eventColumn = new EventColumn("Col1", EventTimeFormat.RFC1123);
            taskConfig.setEventTimeColumn(eventColumn);
            taskConfig.addParquetSchema(new ParquetSchema("PK1", "PK1", DataType.UTF8));
            taskConfig.addParquetSchema(new ParquetSchema("PK2", "PK2", DataType.BOOL));
            taskConfig.addParquetSchema(new ParquetSchema("Col1", "Col1", DataType.UTF8));
            CreateDeliveryTaskRequest request = new CreateDeliveryTaskRequest();
            request.setTableName(tableName);
            request.setTaskName(taskName);
            request.setTaskConfig(taskConfig);
            request.setTaskType(DeliveryTaskType.BASE_INC);
            CreateDeliveryTaskResponse response = client.createDeliveryTask(request);
            System.out.println("resquestID: "+ response.getRequestId());
            System.out.println("traceID: " + response.getTraceId());
            System.out.println("create delivery task success");
        }
}