All Products
Search
Document Center

E-MapReduce:What is EMR Serverless Spark?

Last Updated:Sep 12, 2024

E-MapReduce (EMR) Serverless Spark is a cloud native and fully managed serverless service that is designed for large-scale data processing and analysis. It provides end-to-end data platform services for enterprises, such as job development, debugging, scheduling, and O&M. This simplifies data processing workflows throughout the lifecycle. EMR Serverless Spark helps enterprises improve efficiency by focusing on data analysis and data value extraction.

Features

Fully managed data platform services for enterprises

  • Ease of use

    We are committed to providing optimal user experience. You can start to develop jobs without the need to build complex infrastructure.

  • High performance

    Built based on Fusion Engine, formerly Spark Native Engine, EMR Serverless Spark provides up to three times the performance of open source Spark.

  • High scalability

    Based on the serverless computing capabilities of Alibaba Cloud, EMR Serverless Spark provides highly scalable resources. This helps handle traffic spikes in extract, transform, and load (ETL) jobs while reducing the costs of computing resources.

  • Resource Observability

    Monitoring metrics and alerting of resources and job runs are supported.

  • High security

    EMR Serverless Spark is deployed based on Alibaba Cloud Virtual Private Cloud (VPC) and can be accessed by using VPCs. This enables more fine-grained access control and ensures higher security.

Ecosystem integration based on an open architecture

EMR Serverless Spark is seamlessly integrated with Alibaba Cloud Object Storage Service (OSS), OSS-Hadoop Distributed File System (HDFS), Data Lake Formation (DLF), and DataWorks. This optimizes user experience in using related services.

Architecture

image

Benefits

Ultra-high speed cloud native compute engine

  • Built-in Fusion Engine (formerly Spark Native Engine) enables 200% higher performance than open source Spark.

  • Built-in Celeborn (formerly Remote Shuffle Service) supports petabytes of shuffled data. This reduces the total costs of computing resources by up to 30%.

Open data lake architecture

  • Compute-storage separation, scalable computing resources, and pay-as-you-go storage are supported.

  • This service is integrated with OSS-HDFS and is fully compatible with the cloud storage of HDFS. This allows you to seamless migrate business to the cloud.

  • The centralized DLF metadata service is provided to integrate metadata in data lakes and data warehouses.

End-to-end development

  • End-to-end data development is supported, covering the development, debugging, publishing, and scheduling of jobs.

  • Built-in version management and isolation between the development and production environments are supported to meet enterprise standards for development and publishing.

Serverless resource platform

  • The out-of-the-box service frees you from the need to manually manage and maintain cloud infrastructure.

  • Resources are automatically scaled and provided within seconds.

  • Computing resources are billed based on the pay-as-you-go method. This reduces the total costs of resources.