This topic describes how to query data in Alibaba Cloud Data Lake Formation (DLF) by using ApsaraDB for SelectDB. This helps you perform federated analytics on DLF data.
Overview
DLF provides centralized metadata management on Alibaba Cloud and is compatible with Hive Metastore. You can use ApsaraDB for SelectDB to connect to DLF and access data stored in DLF in the same way as you access the Hive metastore.
Connect to DLF
Create a basic DLF catalog
CREATE CATALOG dlf PROPERTIES (
"type"="hms",
"hive.metastore.type" = "dlf",
"dlf.proxy.mode" = "DLF_ONLY",
"dlf.endpoint" = "dlf-vpc.cn-beijing.aliyuncs.com",
"dlf.region" = "cn-beijing",
"dlf.uid" = "uid",
"dlf.access_key" = "ak",
"dlf.secret_key" = "sk"
);The "type"="hms", "hive.metastore.type" = "dlf", and "dlf.proxy.mode" = "DLF_ONLY", are fixed fields that do not require modifications.
The following table describes other key parameters.
Parameter | Description |
dlf.endpoint | The endpoint that ApsaraDB for SelectDB uses to access DLF. For more information, see Supported regions and endpoints. |
dlf.region | The region that the data center of DLF resides. For more information, see Supported regions and endpoints. |
dlf.uid | The Alibaba Cloud account that is used to log on to DLF. You can view the account ID in the user information section in the upper-right corner of the Alibaba Cloud console. |
dlf.access_key | The AccessKey ID. For more information, see Obtain an AccessKey pair. |
dlf.secret_key | The AccessKey secret. For more information, see Obtain an AccessKey pair. |
"dlf.access.public"="true" | Specifies whether to enable Internet access to data stored in Object Storage Service (OSS). |
After you complete the configurations, you can access metadata stored in DLF in the same way as you access the Hive metastore.
Make sure that the Alibaba Cloud account that you use to create the DLF catalog has the permissions to access DLF.
Data type mappings
The data type mappings between ApsaraDB for SelectDB and DLF are the same as those between ApsaraDB for SelectDB and Apache Hive. For more information, see Hive data source.