EMR Serverless Spark overview - E-MapReduce - Alibaba Cloud Documentation Center

E-MapReduce (EMR) Serverless Spark is a cloud native and fully managed serverless service that is designed for large-scale data processing and analysis. It provides end-to-end data platform services for enterprises, such as job development, debugging, scheduling, and O&M. This simplifies data processing workflows throughout the lifecycle. EMR Serverless Spark helps enterprises improve efficiency by focusing on data analysis and data value extraction.

Features

Fully managed data platform services for enterprises

Ease of use
We are committed to providing optimal user experience. You can start to develop jobs without the need to build complex infrastructure.
High performance
Built based on Fusion Engine, formerly Spark Native Engine, EMR Serverless Spark provides up to four times the performance of open source Spark.
High scalability
Based on the serverless computing capabilities of Alibaba Cloud, EMR Serverless Spark provides highly scalable resources. This helps handle traffic spikes in extract, transform, and load (ETL) jobs while reducing the costs of computing resources.
Resource Observability
Monitoring metrics and alerting of resources and job runs are supported.
High security
EMR Serverless Spark is deployed based on Alibaba Cloud Virtual Private Cloud (VPC) and can be accessed by using VPCs. This enables more fine-grained access control and ensures higher security.

Ecosystem integration based on an open architecture

EMR Serverless Spark is seamlessly integrated with Alibaba Cloud Object Storage Service (OSS), OSS-Hadoop Distributed File System (HDFS), Data Lake Formation (DLF), and DataWorks. This optimizes user experience in using related services.

Architecture

Benefits

Ultra-high speed cloud native compute engine

Built-in Fusion Engine (formerly Spark Native Engine) enables 300% higher performance than open source Spark.
Built-in Celeborn (formerly Remote Shuffle Service) supports petabytes of shuffled data. This reduces the total costs of computing resources by up to 30%.

Open data lake architecture

Compute-storage separation, scalable computing resources, and pay-as-you-go storage are supported.
This service is integrated with OSS-HDFS and is fully compatible with the cloud storage of HDFS. This allows you to seamless migrate business to the cloud.
The centralized DLF metadata service is provided to integrate metadata in data lakes and data warehouses.

End-to-end development

End-to-end data development is supported, covering the development, debugging, publishing, and scheduling of jobs.
Built-in version management and isolation between the development and production environments are supported to meet enterprise standards for development and publishing.

Serverless resource platform

The out-of-the-box service frees you from the need to manually manage and maintain cloud infrastructure.
Resources are automatically scaled and provided within seconds.
Computing resources are billed based on the pay-as-you-go method. This reduces the total costs of resources.