EMR on ECS allows you to deploy E-MapReduce (EMR) clusters on Elastic Compute Service (ECS) instances. EMR on ECS combines the big data processing capabilities of EMR with the containerized deployment advantages of ECS. This allows you to flexibly configure and manage EMR clusters and better adapt to complex data processing and analytics scenarios. You can use EMR on ECS to quickly create, manage, and maintain EMR clusters and efficiently use computing and storage resources.
Benefits
EMR allows you to easily deploy enterprise-level open source big data services, such as Hadoop, Spark, Flink, Kafka, and HBase.
All components in EMR are open source. EMR adapts to and optimizes open source components and provides higher performance than the open source versions of the components.
Preemptible instances can help reduce costs based on the time-based auto scaling capability.
Computing and storage are decoupled to support the elastic use of resources.
You can create or scale out a cluster within minutes. You do not need to manually deploy or start services.
Billing
EMR on ECS supports the following billing methods:
Subscription: You pay for resources based on a specific subscription duration before you can use the resources.
Pay-as-you-go: You can use resources before you pay for the resources. You can purchase and release resources based on your business requirements.
For more information about the billing rules, see Billing overview.
Comparison between Alibaba Cloud EMR clusters and self-managed Hadoop clusters
The following table compares Alibaba Cloud EMR clusters and self-managed Hadoop clusters.
Item | EMR cluster | Self-managed Hadoop cluster |
Cost | You are charged for the resources on a subscription or pay-as-you-go basis. You can adjust the resources in an EMR cluster in a flexible manner and store data at different layers. The resource utilization is high. No additional software license fees are generated. | Resources are estimated in advance and are relatively fixed. The resource utilization is low. A Hadoop distribution is used. Therefore, additional license fees are generated. |
Performance | The performance is significantly improved. | Open source component versions are used. You need to optimize performance based on your business requirements. |
Ease of use | EMR Hadoop clusters can be started in minutes to quickly respond to business requirements. | You must purchase servers and deploy Hadoop components. It may take several weeks to create a self-managed cluster. |
Elasticity | You can temporarily start and delete clusters based on jobs. Cluster resources can be dynamically adjusted by cluster load or in the specified period of time. JindoFS uses a compute-storage separated architecture. You can separately scale computing and storage resources. | A compute-storage integrated architecture is used. Resources are relatively fixed and cannot be adjusted in a flexible manner. |
Security | Enterprises can manage resources based on the multi-tenancy capability that is provided by EMR clusters, manage permissions on tables, columns, and rows, and audit logs. Data encryption is supported. | You need to configure the multi-tenancy capability. The multi-tenancy capability requires optimization and cannot meet the requirements of enterprises. |
Reliability | EMR clusters are verified in the environments of large-scale enterprises. EMR clusters are continuously upgraded based on open source software versions and pass professional compatibility tests. Therefore, EMR clusters provide better user experience than self-managed clusters. | You must upgrade open source components, verify the version compatibility of different components, and fix bugs. |
Service support | Professional and senior big data teams can provide after-sales support. | Service support is unavailable, and additional license fees and service fees are generated for the Hadoop distribution that you use. |