E-MapReduce (EMR) provides MetaService, which serves as a special ECS application role. In EMR V3.32.0 and earlier V3.X.X versions as well as in EMR V4.5.0 and earlier V4.X.X versions, when you create a cluster, this role is automatically bound to your cluster. Applications that run on your EMR cluster use this role to access other Alibaba Cloud resources without an AccessKey pair. This avoids the disclosure of the AccessKey pair in a configuration file.
Prerequisites
This role is authorized. For more information, see Assign roles to an Alibaba Cloud account.
Background information
MetaService allows you to access only Object Storage Service (OSS), Log Service, and Message Service (MNS) without an AccessKey pair.
Permissions
Permission (Action) | Description |
oss:PutObject | Uploads a file or folder. |
oss:GetObject | Queries a file or folder. |
oss:ListObjects | Queries files. |
oss:DeleteObject | Deletes a file. |
oss:ListBuckets | Queries buckets. |
oss:AbortMultipartUpload | Terminates a multipart upload event. |
oss:ListMultipartUploads | Queries all ongoing multipart upload events. |
oss:RestoreObject | Restores an Archive or Cold Archive object. |
oss:GetBucketInfo | Queries the information about a bucket. |
oss:ListObjectVersions | Queries the versions of all objects in a bucket, including delete markers. |
oss:DeleteObjectVersion | Deletes a specific version of an object. |
oss:PostDataLakeStorageFileOperation | Accesses OSS-HDFS. |
ots:CreateTable | Creates a table based on the specified table schema. |
ots:DeleteTable | Deletes a specific table from the current instance. |
ots:GetRow | Reads data in a single row based on a specific primary key. |
ots:PutRow | Inserts data into a specific row. |
ots:UpdateRow | Updates data in a specific row. |
ots:DeleteRow | Deletes a row of data. |
ots:GetRange | Reads data within a specific value range of the primary key. |
ots:BatchWriteRow | Inserts, modifies, or deletes multiple rows of data from one or more tables at a time. |
ots:BatchGetRow | Reads multiple rows of data from one or more tables at a time. |
ots:ComputeSplitPointsBySize | Logically splits data in a table into several shards whose sizes are close to the specified size, and returns the split points between the shards and the prompt about hosts where the partitions reside. |
ots:StartLocalTransaction | Creates a local transaction based on a specified partition key value and queries the ID of the local transaction. |
ots:CommitTransaction | Commits a local transaction. |
ots:AbortTransaction | Aborts a local transaction. |
dlf:BatchCreatePartitions | Creates multiple partitions at a time. |
dlf:BatchCreateTables | Creates multiple tables at a time. |
dlf:BatchDeletePartitions | Deletes multiple partitions at a time. |
dlf:BatchDeleteTables | Deletes multiple tables at a time. |
dlf:BatchGetPartitions | Queries information about multiple partitions at a time. |
dlf:BatchGetTables | Queries information about multiple tables at a time. |
dlf:BatchUpdatePartitions | Updates multiple partitions at a time. |
dlf:BatchUpdateTables | Updates multiple tables at a time. |
dlf:CreateDatabase | Creates a database. |
dlf:CreateFunction | Creates a function. |
dlf:CreatePartition | Creates a partition. |
dlf:CreateTable | Creates a table. |
dlf:DeleteDatabase | Deletes a database. |
dlf:DeleteFunction | Deletes a function. |
dlf:DeletePartition | Deletes a partition. |
dlf:DeleteTable | Deletes a table. |
dlf:GetDatabase | Queries information about a database. |
dlf:GetFunction | Queries information about a function. |
dlf:GetPartition | Queries information about a partition. |
dlf:GetTable | Queries information about a table. |
dlf:ListCatalogs | Queries catalogs. |
dlf:ListDatabases | Queries databases. |
dlf:ListFunctionNames | Queries the names of the functions. |
dlf:ListFunctions | Queries functions. |
dlf:ListPartitionNames | Queries the names of the partitions. |
dlf:ListPartitions | Queries partitions. |
dlf:ListPartitionsByExpr | Queries metadata table partitions by conditions. |
dlf:ListPartitionsByFilter | Queries metadata table partitions by conditions. |
dlf:ListTableNames | Queries the names of tables. |
dlf:ListTables | Queries tables. |
dlf:RenamePartition | Renames a partition. |
dlf:RenameTable | Renames a table. |
dlf:UpdateDatabase | Updates a database. |
dlf:UpdateFunction | Updates a function. |
dlf:UpdateTable | Updates a table. |
dlf:UpdateTableColumnStatistics | Updates the statistics of a metadata table. |
dlf:GetTableColumnStatistics | Queries the statistics of a metadata table. |
dlf:DeleteTableColumnStatistics | Deletes the statistics of a metadata table. |
dlf:UpdatePartitionColumnStatistics | Updates the statistics of a partition. |
dlf:GetPartitionColumnStatistics | Queries the statistics of a partition. |
dlf:DeletePartitionColumnStatistics | Deletes the statistics of a partition. |
dlf:BatchGetPartitionColumnStatistics | Queries the statistics of multiple partitions at a time. |
dlf:CreateLock | Creates a metadata lock. |
dlf:UnLock | Unlocks a specific metadata lock. |
dlf:AbortLock | Aborts a metadata lock. |
dlf:RefreshLock | Refreshes a metadata lock. |
dlf:GetLock | Queries information about a metadata lock. |
dlf:GetAsyncTaskStatus | Queries the status of an asynchronous task. |
dlf:DeltaGetPermissions | Queries permissions. |
dlf:GetPermissions | Queries information about data permissions. |
dlf:GetServiceInfo | Queries information about a service. |
dlf:GetRoles | Queries information about roles in data permissions. |
dlf:CheckPermissions | Verifies data permissions. |
Data sources that support MetaService
MetaService allows you to access OSS, Log Service, and MNS. You can use an EMR SDK in your EMR cluster to read data from and write data to the preceding data sources without an AccessKey pair.
By default, only access to OSS is enabled. If you want to read data from and write data to Log Service and MNS, log on to the RAM console and configure the required permissions for the AliyunEmrEcsDefaultRole role. For more information, see RAM console.
For more information about how to authorize a RAM role, see Grant permissions to a RAM role.
Use MetaService
- Reduces the risk of AccessKey information leak. To minimize the security risk, authorize roles in the RAM console based on the principle of least privilege.
- Improves user experience. MetaService shortens the OSS path that you need to enter during interactive access to OSS resources.
- Brings the following benefits for services in your EMR cluster:
The jobs that you run in the services can access Alibaba Cloud resources (OSS, Log Service, and MNS) without an AccessKey pair.
Comparison of operations before and after MetaService is used:- Run the hadoop fs -ls command to view OSS data.
- MetaService is not used:
hadoop fs -ls oss://ZaH******As1s:Ba23N**************sdaBj2@bucket.oss-cn-hangzhou-internal.aliyuncs.com/a/b/c
- MetaService is used:
hadoop fs -ls oss://bucket/a/b/c
- MetaService is not used:
- Create an external table in Hive.
- MetaService is not used:
CREATE EXTERNAL TABLE test_table(id INT, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '/t' LOCATION 'oss://ZaH******As1s:Ba23N**************sdaBj2@bucket.oss-cn-hangzhou-internal.aliyuncs.com/a/b/c';
- MetaService is used:
CREATE EXTERNAL TABLE test_table(id INT, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '/t' LOCATION 'oss://bucket/a/b/c';
- MetaService is not used:
- Use Spark to view OSS data.
- MetaService is not used:
val data = sc.textFile("oss://ZaH******As1s:Ba23N**************sdaBj2@bucket.oss-cn-hangzhou-internal.aliyuncs.com/a/b/c")
- MetaService is used:
val data = sc.textFile("oss://bucket/a/b/c")
- MetaService is not used:
- Run the hadoop fs -ls command to view OSS data.
- Brings the following benefits for self-deployed services:MetaService is an HTTP service. You can access the URL of this HTTP service to obtain a Security Token Service (STS) temporary credential. Then, you can use the STS temporary credential to access Alibaba Cloud resources without an AccessKey pair in self-managed systems.Important A new STS temporary credential is generated 30 minutes before the current one expires. Both STS credentials can be used within the 30 minutes.
For example, you can run curl http://localhost:10011/cluster-region to obtain the region where your cluster resides.
You can use MetaService to obtain the following information:- Region: /cluster-region
- Role name: /cluster-role-name
- AccessKey ID: /role-access-key-id
- AccessKey secret: /role-access-key-secret
- Security token: /role-security-token
- Network type: /cluster-network-type