As a hub for querying data in Alibaba Cloud, Data Lake Analytics (DLA) has been upgraded to query data in ApsaraDB for RDS instances (using MySQL
, SQLServer
, and Postgres
). This tutorial tells you how to use DLA to query data in ApsaraDB for RDS instances. We use MySQL
in the examples and point out the differences in SQLServer
and Postgres
at the end of the tutorial.
The following describes the syntax used to create an external table whose underlying layer is mapped to MySQL
in DLA:
CREATE SCHEMA hello_mysql_vpc_rds WITH DBPROPERTIES (
CATALOG = 'mysql',
LOCATION = 'jdbc:mysql://rm-2zer0vg58mfofake.mysql.rds.aliyuncs.com:3306/dla_test',
USER = 'dla_test',
PASSWORD = 'the-fake-password',
VPC_ID = 'vpc-2zeij924vxd303kwifake',
INSTANCE_ID = 'rm-2zer0vg58mfo5fake'
);
Different from creating a common table, an external table requires two more attributes: VPC_ID
and INSTANCE_ID. VPC_ID
indicates the ID of the VPC instance where your ApsaraDB for RDS instance is located, as shown in the following figure.
INSTANCE_ID
indicates the ID of your ApsaraDB for RDS instance, which is displayed on the details page of your ApsaraDB for RDS instance.
These attributes allow DLA to access the data in your ApsaraDB for RDS instance, which is inaccessible by default because your databases are stored in your VPC instance, by using Alibaba Cloud VPC reserve access technology.
Permission statements: If you create a database using the preceding method, you agree that Alibaba Cloud uses the VPC reserve access technology to read or write data from or to your ApsaraDB for RDS instance.
You also have to add the IP address segment 100.104.0.0/16
to your ApsaraDB for RDS whitelist. This IP address segment is used for VPC reserve access, as shown in the following figure.
After creating a database, create the table person in your ApsaraDB for RDS instance for test:
create table person (
id int,
name varchar(1023),
age int
);
Insert the following test data:
insert into person
values (1, 'james', 10),
(2, 'bond', 20),
(3, 'jack', 30),
(4, 'lucy', 40);
You can create the corresponding mapping table in the DLA database:
create external table person (
id int,
name varchar(1023),
age int
) tblproperties (
table_mapping = "person"
);
After connecting the MySQL client to the DLA database, you can query data in the MySQL database:
mysql> select * from person;
+------+-------+------+
| id | name | age |
+------+-------+------+
| 1 | james | 10 |
| 2 | bond | 20 |
| 3 | jack | 30 |
| 4 | lucy | 40 |
+------+-------+------+
4 rows in set (0.35 sec)
DLA is not typically used to read data from an ApsaraDB for RDS instance for analysis because an ApsaraDB for RDS instance contains a limited amount of data and is not applicable for massive data analysis. Instead, DLA is often used to analyze massive data in OSS and Table Store instances and write back the result data to ApsaraDB for RDS instances for frontend business. This is easily implemented in DLA. The table person
is used as an example in the following statement to convert 10 records of customer
in oss_db
and insert them into the table hello_mysql_vps_rds.person
:
mysql> insert into hello_mysql_vpc_rds.person
-> select c_custkey, c_name, c_custkey + 20 from oss_db.customer limit 10;
+------+
| rows |
+------+
| 10 |
+------+
1 row in set (4.57 sec)
mysql> select * from person;
+------+--------------------+------+
| id | name | age |
+------+--------------------+------+
| 1 | james | 10 |
| 2 | bond | 20 |
| 3 | jack | 30 |
| 4 | lucy | 40 |
| 1 | Customer#000000001 | 21 |
| 3 | Customer#000000003 | 23 |
| 5 | Customer#000000005 | 25 |
| 2 | Customer#000000002 | 22 |
| 4 | Customer#000000004 | 24 |
| 7 | Customer#000000007 | 27 |
| 6 | Customer#000000006 | 26 |
| 9 | Customer#000000009 | 29 |
| 8 | Customer#000000008 | 28 |
| 10 | Customer#000000010 | 30 |
+------+--------------------+------+
14 rows in set (0.26 sec)
Usage of SQLServer
and PostgreSQL
is similar to that of MySQL. As database
and schema
in MySQL
are equivalent, they are at the same layer. However, database
and schema
in SQLServer
and PostgreSQL
are at two layers. Therefore, the methods for creating a database and a table are different:
CREATE SCHEMA `hello_sqlserver_vpc_rds` WITH DBPROPERTIES
(
CATALOG = 'sqlserver',
LOCATION = 'jdbc:sqlserver://rm-bp15g1r5jf90hfake.sqlserver.rds.aliyuncs.com:3433;DatabaseName=dla_test',
USER='dla_test1',
PASSWORD='this-is-not-a-real-password',
INSTANCE_ID = 'rm-bp15g1r5jf90fake',
VPC_ID = 'vpc-bp1adypqlcn535yrdfake'
);
CATALOG
to sqlserver
.LOCATION
. Different from the database name for MySQL
, the database name for SQLServer is specified using DatabaseName=dla_test
. This is defined by the SQLServer JDBC URL rather than DLA.CREATE SCHEMA `hello_postgresql_vpc_rds` WITH DBPROPERTIES
(
CATALOG = 'postgresql',
LOCATION = 'jdbc:postgresql://rm-bp1oo49r6j3hvfake.pg.rds.aliyuncs.com:3433/dla_test',
USER='dla_test',
PASSWORD='this-is-not-a-real-password',
INSTANCE_ID = 'rm-bp1oo49r6j3hfake',
VPC_ID = 'vpc-bp1adypqlcn535yrfake'
);
The method for creating a database for PostgreSQL is similar to that for MySQL, except setting CATALOG
to postgresql
.
The difference for creating a table is in the table_mapping
field. table_mapping
of MySQL
only contains the table name (person
):
create external table person1 (
id int,
name varchar(1023),
age int
) tblproperties(
table_mapping = 'person'
);
While table_mapping
of SQLServer
or PostgreSQL
contains the names of the schema and table:
create external table person (
id int,
name varchar(1023),
age int
) tblproperties(
table_mapping = 'public.person'
);
That's it! To learn more about Alibaba Cloud Data Lake Analytics at www.alibabacloud.com/products/data-lake-analytics
Using Data Lake Analytics to Analyze Data in Table Store Instances
2,599 posts | 764 followers
FollowAlibaba Clouder - July 5, 2019
Alibaba Clouder - November 12, 2018
Alibaba EMR - October 12, 2021
Alibaba Clouder - July 4, 2019
Alibaba Clouder - February 7, 2018
Alibaba Cloud MaxCompute - December 22, 2021
2,599 posts | 764 followers
FollowA Big Data service that uses Apache Hadoop and Spark to process and analyze data
Learn MoreA fully managed NoSQL cloud database service that enables storage of massive amount of structured and semi-structured data
Learn MoreAn encrypted and secure cloud storage service which stores, processes and accesses massive amounts of data from anywhere in the world
Learn MoreMore Posts by Alibaba Clouder