All Products
Search
Document Center

Object Storage Service:Connect EMR clusters to OSS-HDFS

Last Updated:Aug 28, 2024

OSS-HDFS is integrated into specific versions of Alibaba Cloud E-MapReduce (EMR) clusters. This topic describes how to connect EMR clusters to OSS-HDFS and perform common operations.

Note

If you use a self-managed Hadoop cluster, connect the self-managed Hadoop cluster to OSS-HDFS in the same manner that you connect non-EMR clusters to OSS-HDFS. For more information, see Connect non-EMR clusters to OSS-HDFS.

Prerequisites

  • OSS-HDFS is enabled for a bucket and permissions are granted to a RAM role to access OSS-HDFS. For more information, see Enable OSS-HDFS and grant access permissions.

  • By default, an Alibaba Cloud account has the permissions to connect EMR clusters to OSS-HDFS and perform common operations related to OSS-HDFS. A RAM user that is granted the required permissions is created. If you want to use a RAM user to connect EMR clusters to OSS-HDFS, the RAM user must have the required permissions. For more information, see Grant a RAM user permissions to connect EMR clusters to OSS-HDFS.

Procedure

  1. Log on to the E-MapReduce console. In the left-side navigation pane, click EMR on ECS and create an EMR cluster.

    When you create the EMR cluster, make sure that you set Product Version to EMR-3.46.2 or later or EMR-5.12.2 or later and Root Storage Directory of Cluster to a bucket for which OSS-HDFS is enabled. Use the default values for other parameters. For more information, see Create a cluster.

  2. Log on to the EMR cluster.

    1. Click the created EMR cluster.

    2. Click the Nodes tab, and then click + on the left side of the node group.

    3. Click the ID of the Elastic Compute Service (ECS) instance. On the Instances page, click Connect next to the instance ID to log on to the cluster by using Workbench.

      For more information about how to log on to a cluster in Windows or Linux by using an SSH key pair or an SSH password, see Log on to a cluster.

  3. Run HDFS Shell commands to perform common operations that are related to OSS-HDFS.

    • Upload local files

      Run the following command to upload a local file named examplefile.txt in the local root directory to a bucket named examplebucket:

      hdfs dfs -put examplefile.txt oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/
    • Download objects

      Run the following command to download an object named exampleobject.txt from a bucket named examplebucket to the root directory named /tmp on your computer:

      hdfs dfs -get oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/exampleobject.txt  /tmp/

    For more information, see Use Hadoop Shell commands to access OSS-HDFS.