This topic describes the parameters that you must configure when you use the metadata of Data Lake Formation (DLF) in an Iceberg table.
The following compute engines are supported:
Spark
Alibaba Cloud Object Storage Service (OSS) is used as the file system. The default name of the catalog and the parameters that you must configure vary based on the version of your cluster.
EMR V3.40 or a later minor version, and EMR V5.6.0 or later
NoteThe default name of the catalog is iceberg.
Parameter
Description
Remarks
spark.sql.extensions
The SQL extension module of Spark.
Set the value to org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.
NoteThis parameter is introduced in Iceberg 0.11.0. Only Spark 3.x supports this parameter.
spark.sql.catalog.<catalog-name>
The name of the catalog.
Set the value to org.apache.iceberg.spark.SparkCatalog.
spark.sql.catalog.<catalog-name>.catalog-impl
The class name of the catalog.
Set the value to org.apache.iceberg.aliyun.dlf.hive.DlfCatalog.
EMR V3.39.X and EMR V5.5.X
NoteThe default name of the catalog is dlf.
Parameter
Description
Remarks
spark.sql.extensions
The SQL extension module of Spark.
Set the value to org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.
NoteThis parameter is introduced in Apache Iceberg 0.11.0. Only Apache Spark 3.x supports this parameter.
spark.sql.catalog.<catalog-name>
The name of the catalog.
Set the value to org.apache.iceberg.spark.SparkCatalog.
spark.sql.catalog.<catalog-name>.catalog-impl
The class name of the catalog.
Set the value to org.apache.iceberg.aliyun.dlf.hive.DlfCatalog.
EMR V3.38.X, EMR V5.3.X, and EMR V5.4.X
NoteThe default name of the catalog is dlf_catalog.
Parameter
Description
Remarks
spark.sql.extensions
The SQL extension module of Spark.
Set the value to org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.
NoteThis parameter is introduced in Apache Iceberg 0.11.0. Only Apache Spark 3.x supports this parameter.
spark.sql.catalog.<catalog-name>
The name of the catalog.
Set the value to org.apache.iceberg.spark.SparkCatalog.
spark.sql.catalog.<catalog-name>.catalog-impl
The class name of the catalog.
Set the value to org.apache.iceberg.aliyun.dlf.DlfCatalog.
spark.sql.catalog.<catalog-name>.io-impl
The name of the class that is written to the catalog during the I/O operation.
Set the value to org.apache.iceberg.hadoop.HadoopFileIO.
spark.sql.catalog.<catalog-name>.oss.endpoint
The endpoint of your OSS bucket.
For more information, see Regions and endpoints.
We recommend that you set this parameter to the virtual private cloud (VPC) endpoint of the OSS bucket. For example, if you select the China (Hangzhou) region, set this parameter to oss-cn-hangzhou-internal.aliyuncs.com.
NoteIf you want to access OSS across VPCs, set this parameter to the public endpoint of the OSS bucket.
spark.sql.catalog.<catalog-name>.warehouse
The OSS path in which table data is stored.
None.
spark.sql.catalog.<catalog-name>.access.key.id
The AccessKey ID of your Alibaba Cloud account.
For more information about how to obtain the AccessKey ID of an Alibaba Cloud account, see Obtain an AccessKey pair.
spark.sql.catalog.<catalog-name>.access.key.secret
The AccessKey secret of your Alibaba Cloud account.
For more information about how to obtain the AccessKey secret of an Alibaba Cloud account, see Obtain an AccessKey pair.
spark.sql.catalog.<catalog-name>.dlf.catalog-id
The ID of your Alibaba Cloud account.
To obtain the ID of your Alibaba Cloud account, go to the Security Settings page.
spark.sql.catalog.<catalog-name>.dlf.endpoint
The endpoint of DLF.
We recommend that you set this parameter to the VPC endpoint of DLF. For example, if you select the China (Hangzhou) region, set this parameter to dlf-vpc.cn-hangzhou.aliyuncs.com.
NoteYou can set this parameter to the public endpoint of DLF. If you select the China (Hangzhou) region, set this parameter to dlf.cn-hangzhou.aliyuncs.com.
spark.sql.catalog.<catalog-name>.dlf.region-id
The ID of the region in which DLF is activated.
Make sure that the region you specified in this parameter matches the endpoint you specified in the spark.sql.catalog.<catalog-name>.dlf.endpoint parameter.
Hive
You can configure the parameters described in the following tables based on the version of your cluster.
EMR V3.39.0 or a later minor version, and EMR V5.5.0 or later
NoteThe default name of the catalog is dlf.
Parameter
Description
Remarks
iceberg.catalog.<catalog-name>.catalog-impl
The class name of the catalog.
Set the value to org.apache.iceberg.aliyun.dlf.hive.DlfCatalog.
EMR V3.38.X, EMR V5.3.X, and EMR V5.4.X
NoteThe default name of the catalog is dlf_catalog.
Parameter
Description
Remarks
iceberg.catalog
The name of the catalog.
Set the value to a custom name.
iceberg.catalog.<catalog-name>.type
The type of the catalog.
Set the value to custom.
iceberg.catalog.<catalog-name>.catalog-impl
The class name of the catalog.
Set the value to org.apache.iceberg.aliyun.dlf.DlfCatalog.
iceberg.catalog.<catalog-name>.io-impl
The name of the class that is written to the catalog during the I/O operation.
Set the value to org.apache.iceberg.hadoop.HadoopFileIO.
iceberg.catalog.<catalog-name>.warehouse
The warehouse path in which table data is stored.
Table data can be stored in Hadoop Distributed File System (HDFS) or OSS.
iceberg.catalog.<catalog-name>.access.key.id
The AccessKey ID of your Alibaba Cloud account.
For more information about how to obtain the AccessKey ID of an Alibaba Cloud account, see Obtain an AccessKey pair.
iceberg.catalog.<catalog-name>.access.key.secret
The AccessKey secret of your Alibaba Cloud account.
For more information about how to obtain the AccessKey secret of an Alibaba Cloud account, see Obtain an AccessKey pair.
iceberg.catalog.<catalog-name>.dlf.catalog-id
The ID of your Alibaba Cloud account.
To obtain the ID of your Alibaba Cloud account, go to the Security Settings page.
iceberg.catalog.<catalog-name>.dlf.endpoint
The endpoint of DLF.
We recommend that you set this parameter to the VPC endpoint of DLF. For example, if you select the China (Hangzhou) region, set this parameter to dlf-vpc.cn-hangzhou.aliyuncs.com.
NoteYou can set this parameter to the public endpoint of DLF. If you select the China (Hangzhou) region, set this parameter to dlf.cn-hangzhou.aliyuncs.com.
iceberg.catalog.<catalog-name>.dlf.region-id
The ID of the region in which DLF is activated.
Make sure that the region you specified in this parameter matches the endpoint you specified in the iceberg.catalog.<catalog-name>.dlf.endpoint parameter.