AnalyticDB for MySQL allows you to use Spark SQL to access Lindorm data. This topic describes how to use Spark SQL to access data of Hive and wide tables in Lindorm.
Prerequisites
A single-zone Lindorm instance is created. For more information, see Create an instance.
ImportantYou must activate LindormTable, Lindorm Distributed Processing System (LDPS), and LindormDFS for the Lindorm instance. For information about how to activate LDPS, see Activate LDPS and modify the configurations.
OSS is activated. A bucket and a directory are created in OSS. For more information, see Activate OSS, Create a bucket, and Create a directory.
An AnalyticDB for MySQL Data Lakehouse Edition cluster is created.
A job resource group is created for the AnalyticDB for MySQL Data Lakehouse Edition cluster. For more information, see Create a resource group.
A database account is created for the AnalyticDB for MySQL Data Lakehouse Edition cluster.
Preparations
Log on to the Lindorm console and go to the Instance Details page to view the vSwitch ID of the Lindorm instance.
Log on to the Lindorm console and go to the Access Control page. Click the Security Groups tab to view the ID of the security group to which the Lindorm instance belongs. For information about how to add a security group, see Add a security group.
Access the data of a Hive table in Lindorm
Obtain the hdfs-site configuration information of your Hadoop Distributed File System (HDFS) client.
NoteTo obtain the hdfs-site configuration information, contact Lindorm technical support (DingTalk ID: s0s3eg3).
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose Job Development > SQL Development.
On the SQLConsole tab, select the Spark engine and a job resource group.
On the SQLConsole tab, enter the following code:
SET spark.adb.connectors = oss,external_hive; SET spark.adb.eni.enabled=true; SET spark.adb.eni.vswitchId=vsw-bp14pj8h0k5p0kwu3****; SET spark.adb.eni.securityGroupId=sg-bp11m93k021tp4ca****; SET spark.sql.catalog.lindorm_catalog = org.apache.spark.sql.hive.V2HiveExternalCatalog; SET spark.adb.eni.adbHostAlias.enabled=true; SET spark.sql.catalogImplementation = hive; -- Configure HDFS-related parameters. SET spark.sql.catalog.lindorm_catalog.spark.hadoop.hive.metastore.uris=thrift://ld-bp1ttz8833x9c****-proxy-ldps-hms.lindorm.aliyuncs.com:9083; SET spark.hadoop.dfs.nameservices=ld-bp1ttz8833x9c****; SET spark.hadoop.dfs.client.failover.proxy.provider.ld-bp1ttz8833x9c****=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider; SET spark.hadoop.dfs.ha.automatic-failover.enabled=true; SET spark.hadoop.dfs.ha.namenodes.ld-bp1ttz8833x9c****=nn1,nn2; SET spark.hadoop.dfs.namenode.rpc-address.ld-bp1ttz8833x9c****.nn1=ld-bp1ttz8833x9c-master1-001.lindorm.rds.aliyuncs.com:8020; SET spark.hadoop.dfs.namenode.rpc-address.ld-bp1ttz8833x9c****.nn2=ld-bp1ttz8833x9c****-master2-001.lindorm.rds.aliyuncs.com:8020; SET spark.hadoop.dfs.namenode.http-address.ld-bp1ttz8833x9c****.nn1=ld-bp1ttz8833x9c****-master1-001.lindorm.rds.aliyuncs.com:50071; SET spark.hadoop.dfs.namenode.http-address.ld-bp1ttz8833x9c****.nn2=ld-bp1ttz8833x9c****-master2-001.lindorm.rds.aliyuncs.com:50071; -- Create an external database and an external table in OSS. CREATE DATABASE external_oss_db location 'oss://testBucketName/warehouse/db'; CREATE TABLE external_oss_db.test(id int, name string) using parquet location 'oss://testBucketName/warehouse/db/test'; -- Insert the data of a Hive table into the OSS external table. INSERT INTO external_oss_db.test SELECT * FROM lindorm_catalog.spark_test.test; -- Read the data of the OSS external table. SELECT * FROM external_oss_db.test;
Parameter
Required
Description
spark.adb.connectors
Yes
The names of the built-in connectors of AnalyticDB for MySQL Spark that you want to enable. Separate multiple names with commas (,). Valid values: oss, hudi, delta, adb, odps, external_hive, and jindo.
In this example,
oss and external_hive
are used.spark.adb.eni.enabled
Yes
Specifies whether to enable Elastic Network Interface (ENI).
If you use external tables to access other external data sources, you must enable ENI. Valid values:
true
false
In this example, set this parameter to
true
.spark.adb.eni.vswitchId
Yes
The vSwitch ID of the Lindorm instance that is obtained during preparations.
spark.adb.eni.securityGroupId
Yes
The security group ID of the Lindorm instance that is obtained during preparations.
spark.sql.catalog.lindorm_catalog
Yes
The method that is used by Spark SQL to configure data sources.
This parameter is automatically set to
org.apache.spark.sql.hive.V2HiveExternalCatalog
.NoteYou can configure a custom
catalog_name
value for the spark.sql.catalog.catalog_name parameter. In this example,lindorm_catalog
is used.spark.adb.eni.adbHostAlias.enabled
Yes
Specifies whether to automatically write the domain name resolution information that AnalyticDB for MySQL requires to a mapping table of domain names and IP addresses. Valid values:
true
false
If you use an ENI to read data from or write data to EMR Hive, you must set this parameter to true.
HDFS-related parameters
Yes
The hdfs-site configuration information that is obtained in Step 1. Format:
set spark.hadoop.<name>=<value>
.NoteReplace the <name> and <value> parameters with the actual values in the configuration information.
location
Yes
The OSS path of the external table. Format:
oss://<bucketname/database/table>
. In this example,oss://testBucketName/warehouse/db/test
is used.NoteReplace the
<bucketname/database/table>
parameter with the actual OSS path.Click Execute.
After you run the Spark job, go to the Spark JAR Development page and click Logs in the Actions column on the Applications tab to view the table data. For more information, see Spark editor.
Access the data of a wide table in Lindorm
Obtain the Virtual Private Cloud (VPC) endpoint of LindormTable for access by using the HBase Java API. For more information, see the " View the endpoints of LindormTable" section of the View endpoints topic.
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose Job Development > SQL Development.
On the SQLConsole tab, select the Spark engine and a job resource group.
On the SQLConsole tab, enter the following code:
SET spark.adb.eni.enabled=true; SET spark.adb.eni.vswitchId=vsw-bp14pj8h0k5p0kwu3****; SET spark.adb.eni.securityGroupId=sg-bp11m93k021tp4ca****; SET spark.sql.catalog.lindorm_table = com.alibaba.lindorm.ldspark.datasources.v2.LindormCatalog; SET spark.sql.catalog.lindorm_table.url=ld-bp1ttz8833x9c****-proxy-lindorm.lindorm.rds.aliyuncs.com:30020; SET spark.sql.catalog.lindorm_table.username=test; SET spark.sql.catalog.lindorm_table.password=password; add jar "oss://<bucketname>/hbase_jars/*"; -- Create an external database and an external table in OSS. CREATE DATABASE IF NOT exists external_oss_db location 'oss://testBucketName/warehouse/db'; CREATE TABLE IF NOT exists external_oss_db.lindorm_table_2(id string, name string) using parquet location 'oss://testBucketName/warehouse/db/lindorm_table_2'; -- Insert the data of a wide table into the OSS external table. INSERT INTO external_oss_db.lindorm_table_2 SELECT * FROM lindorm_table.spark_test.tb; -- Read the data of the OSS external table. SELECT * FROM external_oss_db.lindorm_table_2;
The following table describes the parameters.
Parameter
Required
Description
spark.adb.eni.enabled
Yes
Specifies whether to enable Elastic Network Interface (ENI).
If you use external tables to access other external data sources, you must enable ENI. Valid values:
true
false
In this example, set this parameter to
true
.spark.adb.eni.vswitchId
Yes
The vSwitch ID of the Lindorm instance that is obtained during preparations.
spark.adb.eni.securityGroupId
Yes
The security group ID of the Lindorm instance that is obtained during preparations.
spark.sql.catalog.lindorm_table
Yes
The method that is used by Spark SQL to configure data sources.
This parameter is automatically set to
com.alibaba.lindorm.ldspark.datasources.v2.LindormCatalog
.NoteYou can configure a custom
catalog_name
value for the spark.sql.catalog.catalog_name parameter. In this example,lindorm_table
is used.spark.sql.catalog.lindorm_table.url
Yes
The VPC endpoint for access by using the HBase Java API that is obtained in Step 1.
spark.sql.catalog.lindorm_table.username
Yes
The username and password of LindormTable.
If you forget your password, you can change the password in the cluster management system of LindormTable.
spark.sql.catalog.lindorm_table.password
Yes
jar
Yes
The OSS path of the package that is downloaded in Step 2. In this example,
oss://testBucketName/hbase_jars/*
is used.location
Yes
The OSS path of the external table. Format:
oss://<bucketname/database/table>
. In this example,oss://testBucketName/warehouse/db/lindorm_table_2
is used.NoteReplace the
<bucketname/database/table>
parameter with the actual OSS path.Click Execute.
After you run the Spark job, go to the Spark JAR Development page and click Logs in the Actions column on the Applications tab to view the table data. For more information, see Spark editor.