This topic describes how to establish a seamless connection between Tablestore and MaxCompute that belong to the same Alibaba Cloud account.
Background information
MaxCompute is a cost-effective and fully managed platform for the warehousing of petabytes of data. You can use MaxCompute to process and analyze large amounts of data in a fast and efficient manner. You can execute a simple Data Definition Language (DDL) statement to create an external table in MaxCompute to associate the external table with an external data source. This allows you to access and export data in various formats. MaxCompute tables can contain only structured data. However, external tables can be used to process structured and non-structured data.
Tablestore and MaxCompute support different types of data. The following table describes the mappings between the data types supported in Tablestore and MaxCompute.
Tablestore | MaxCompute |
STRING | STRING |
INTEGER | BIGINT |
DOUBLE | DOUBLE |
BOOLEAN | BOOLEAN |
BINARY | BINARY |
Preparations
Before you use MaxCompute to access Tablestore, make sure that the following preparations are complete:
MaxCompute is activated.
An AccessKey pair is created.
MaxCompute is authorized to access Tablestore in the Resource Access Management (RAM) console. For more information, see Cross-account authorization.
An instance and a table are created in the Tablestore console. For more information, see Create instances and Create tables.
The following items describe the information about the created instance and table in this example:
Instance name: cap1
Data table name: vehicle_track
Primary key information: vid(integer) and gt(integer)
Endpoint:
https://cap1.cn-hangzhou.ots-internal.aliyuncs.com
NoteWhen you use MaxCompute to access Tablestore, we recommend that you use the private endpoint of Tablestore.
The network access control settings for the instance. By default, you can access a Tablestore instance from the Tablestore console and over VPCs, the classic network, and the Internet. In this example, the default settings in the Network Access Control section are used. The following figure shows the default settings in the Network Access Control section.
Step 1: Install and configure the client
Download and decompress the MaxCompute client installation package.
NoteMake sure that JRE 1.7 or later is installed on your machine.
Modify the conf/odps_config.ini file to configure the client. The following sample code provides an example on how to configure the client:
# Specify the AccessKey pair of your Alibaba Cloud account. An AccessKey pair consists of an AccessKey ID and an AccessKey secret. To obtain the AccessKey pair of your Alibaba Cloud account, log on to the Alibaba Cloud official website, go to the Alibaba Cloud Management Console, move the pointer over the profile picture in the upper-right corner, and then click AccessKey Management. access_id=******************* access_key=********************* # Specify the name of the project that you want to access. project_name=my_project # Specify the endpoint of MaxCompute. end_point=https://service.odps.aliyun.com/api # Specify the link to access the MaxCompute Tunnel service. tunnel_endpoint=https://dt.odps.aliyun.com # Specify the LogView URL that the client returns after a job is run. After you access the LogView URL, you can view detailed information about the job. log_view_host=http://logview.odps.aliyun.com # Specify whether to enable access over HTTPS. https_check=true
NoteIn the odps_config.ini file, a number sign (
#
) is used to add comments. The MaxCompute client uses--
to add comments.Run bin/odpscmd.bat. Enter
show tables;
.If the tables in the current MaxCompute project are displayed, the preceding configurations are valid.
Step 2: Create an external table
Create a MaxCompute data table named ots_vehicle_track to associate the table with a Tablestore table named vehicle_track.
The following items describe information about the associated Tablestore table:
Instance name: cap1
Data table name: vehicle_track
Primary key information: vid(int) and gt(int)
Endpoint:
https://cap1.cn-hangzhou.ots-internal.aliyuncs.com
CREATE EXTERNAL TABLE IF NOT EXISTS ots_vehicle_track
(
vid bigint,
gt bigint,
longitude double,
latitude double,
distance double ,
speed double,
oil_consumption double
)
STORED BY 'com.aliyun.odps.TableStoreStorageHandler' -- (1)
WITH SERDEPROPERTIES ( -- (2)
'tablestore.columns.mapping'=':vid, :gt, longitude, latitude, distance, speed, oil_consumption', -- (3)
'tablestore.table.name'='vehicle_track' -- (4)
)
LOCATION 'tablestore://cap1.cn-hangzhou.ots-internal.aliyuncs.com'; -- (5)
The following table describes the parameters.
Sequence number | Parameter | Description |
(1) | com.aliyun.odps.TableStoreStorageHandler | The MaxCompute built-in storage handler that processes Tablestore data and defines the interaction between MaxCompute and Tablestore. MaxCompute implements the related logic. |
(2) | SERDEPROPERITES | The operation that provides parameter options. When you use the storage handler that processes Tablestore data, you must configure the tablestore.columns.mapping and tablestore.table.name parameters. |
(3) | tablestore.columns.mapping | Required. The columns of the Tablestore table that you want MaxCompute to access. The columns include primary key columns and attribute columns. If a colon |
(4) | tablestore.table.name | The name of the Tablestore table that you want MaxCompute to access. If the specified name of the Tablestore table does not exist, an error occurs. In this case, MaxCompute does not create the Tablestore table. |
(5) | LOCATION | The information about the Tablestore instance to which the Tablestore table belongs. The information includes the name of the instance and the endpoint. |
Step 3: Use the external table to access Tablestore data
After you create an external table, Tablestore data is imported to MaxCompute. Then, you can use MaxCompute SQL commands to access Tablestore data.
-- Collect statistics about the average speed and fuel consumption of vehicles whose VIDs are less than 4 before the timestamp 1469171387.
select vid,count(*),avg(speed),avg(oil_consumption) from ots_vehicle_track where vid <4 and gt<1469171387 group by vid;
Sample command output: