By Shantanu Kaushik
In Part 1 of this 2-part series, we discussed how Alibaba Cloud E-MapReduce provides extensive elasticity and operational superiority in analyzing and processing data. We also discussed the prominent features and benefits of Alibaba Cloud E-MapReduce. In this article, we will discuss E-MapReduce cluster management and explain how it works in real-world scenarios. We will also discuss the various usage scenarios related to Alibaba Cloud EMR and the primary benefits of using EMR over the open-source big data ecosystem.
Alibaba Cloud E-MapReduce provides support for multiple components:
Now that we know Alibaba Cloud E-MapReduce supports component-level integration, let’s take a look at the advantages it offers over the open-source big data ecosystems that are widely available for enterprises to deploy and apply.
The image above showcases all the steps that are required concerning your application logic. Alibaba Cloud EMR focuses on the last three steps to provide highly integrated, seamless cluster management. The first seven steps are preparations, but the last three steps are the most complex and time-consuming.
Alibaba Cloud EMR integrates a plethora of features that are required for cluster management. Some of the prime features are:
Alibaba Cloud EMR is a highly efficient and self-sustained solution that frees you from all the tedious procurement, preparation, and O&M work required to build clusters. You only need to work on processing the logic of your applications.
Alibaba Cloud EMR functions with different combinations of cluster services to help you meet your business requirements. After running the Hadoop service for Alibaba Cloud EMR, you can perform:
Here, if you include Spark, you can have added functionality to perform functions, such as:
EMR clusters are the core user-centric components based on Hadoop or Spark and are deployed on one or more ECS instances. Let’s imagine a scenario where a Hadoop cluster consists of processes that run on the ECS instances of the cluster. Here, each ECS instance corresponds to a node. Alibaba Cloud EMR will intelligently distribute the execution of these processes after determining if they need to run them on the master node or core and task nodes. It depends on the priority and resource allocation of a task. Let’s take a look at how EMR clusters are managed on the chart below. Here, one master node has three slave nodes (core and task nodes) to facilitate multiple tasks simultaneously depending on their priority and required resources.
Let’s take a look at an architectural overlay of this scenario on the chart below:
Alibaba Cloud E-MapReduce supports multiple data integration points using:
If we look at the architectural flow, we can notice that Alibaba Cloud EMR uses multiple services, such as Alibaba Cloud MaxCompute, for data integration.
EMR also uses Data Transmission Service (DTS) to accept data from database clusters based on various DB platforms. It uses Object Storage Service as the Hadoop Storage File System (HDFS) and the Log Service to record everything.
Cost-effectiveness is a big factor when working with Big Data technologies. Alibaba Cloud EMR supports multiple compute engines that include:
EMR supports reading data from multiple data sources like Object Storage Service (OSS), MaxCompute, Kafka, and HDFS and allows for superb offline data processing. It writes the compute results to software in varying formats.
Alibaba Cloud is a highly flexible and scalable platform. It creates Hadoop clusters easily and efficiently to enable flexible and fast data analysis. The platform automatically releases the clusters after the data processing finishes. This form of elasticity is required while processing huge amounts of data, as it applies maximum cost-cutting. You are free to adjust the number of compute nodes within a cluster to adjust the processing priority for a task.
In this scenario, Alibaba Cloud EMR applies the real-time computing scenario by enabling a flexible and reliable approach to induce a stable system. With an application of multiple real-time data sources, you can efficiently analyze and process this data using compute engines like Spark Streaming, Flink, and Storm.
Let’s take a look at the chart below to understand this scenario:
Alibaba Cloud solutions are based on the core computing practices, with an alignment towards flexibility, availability, and high performance. These basic requirements enable a solution to work optimally and provide productive results.
2,599 posts | 762 followers
FollowAlibaba Clouder - March 31, 2021
Alibaba Clouder - March 29, 2021
Alibaba Clouder - September 2, 2019
Alibaba Clouder - March 29, 2021
Alibaba Clouder - March 24, 2021
Alibaba Clouder - March 30, 2021
2,599 posts | 762 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreSupports data migration and data synchronization between data engines, such as relational database, NoSQL and OLAP
Learn MoreAccelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn MoreMore Posts by Alibaba Clouder