By Zengzeng Suo (from Xiaohongshu), Zehui Song (from Xiaohongshu), and Zuowei Zhang (from Alibaba Cloud)
Koordinator is an open-source project developed by Alibaba Cloud based on their extensive experience in container scheduling. It currently provides support for the hybrid deployment of the Kubernetes ecosystem. However, many users still run big data tasks in resource management systems like Apache Hadoop YARN [1] , even outside the Kubernetes ecosystem. While some computing engines offer Kubernetes operators to connect tasks to the Kubernetes ecosystem, it is evident that the YARN ecosystem maintains its popularity, as seen in the popularity of products like E-MapReduce [2] provided by major cloud vendors, including Alibaba Cloud.
Before Koordinator, there were internal practices in the industry for the hybrid deployment of Kubernetes and YARN. However, most of these implementations involved intrusive modifications to the YARN system itself, which were not user-friendly in terms of operation, maintenance, and iteration. To allow more users to benefit from the open-source technology of the community, the design of Koordinator follows the following principles:
• Offline workloads are still submitted to YARN.
• YARN, based on the open-source Hadoop YARN, should not undergo intrusive modifications.
• The hybrid resources provided by Koordinator can be utilized by Kubernetes pods and YARN tasks, allowing different types of offline applications to coexist on the same node.
• QoS policies in standalone mode are managed by Koordlet and are compatible with the runtime of YARN tasks.
ResourceManager (RM) and NodeManager (NM) are core components of YARN. RM is responsible for task reception and resource scheduling on the control side, while NM is responsible for managing the lifecycle of tasks. In the hybrid deployment of YARN and Kubernetes, RM will continue to be deployed independently as the core component of the YARN cluster, while NM will be deployed as a container.
To synchronize batch resources to YARN RM, Koordinator introduces the koord-yarn-operator module. For finer resource management, YARN tasks operate independently of NM resource management. During deployment, NM only needs to request batch hybrid resources based on its own overhead. The resource usage of YARN tasks is managed through cgroups (LinuxContainerExecutor mode). The cgroup path is placed under the besteffort Pod QoS, ensuring that the resources are managed within the besteffort group just like other Kubernetes pods.
Koordlet currently supports a range of QoS policies in standalone mode, which need to be adapted for YARN scenarios. For resource isolation parameters such as Group Identity, Memory QoS, and L3 Cache isolation, koordlet adapts to the designed cgroup hierarchy. For dynamic policies like eviction and suppression, koordlet introduces a sidecar module called koord-yarn-copilot to connect to various data and operations in YARN scenarios, including YARN task metadata collection, resource index collection, and task eviction operations. All QoS policies remain within koordlet, and relevant modules within koordlet connect to koord-yarn-copilot interfaces as plugins. Additionally, the interface design of koord-yarn-copilot allows for future extensibility to connect with other resource frameworks.
For more information about the detailed design of the hybrid deployment of YARN and Kubernetes, please refer to the community design document [3].
Under the background of cost reduction and efficiency improvement, Xiaohongshu is undergoing internal commercialization. In businesses like community research, there is a significant accumulation of algorithmic Spark tasks due to the shortage of resources in the offline cluster, resulting in a backlog of tasks that cannot be processed in a timely manner. At the same time, the utilization rate of resources in the online cluster is relatively low during off-peak business periods. Additionally, a considerable proportion of Spark task resource scheduling still runs on the YARN scheduler.
To address this situation, and leveraging Xiaohongshu's existing capabilities in hybrid deployment, we have opened up the resource view between the Kubernetes scheduler and the YARN scheduler. This allows YARN tasks to be evicted and QoS guarantees to be implemented at the single machine level. As a result, a large number of Spark tasks can run stably on idle online resources while maintaining the existing offline service submission portal and user habits. This improvement enhances the resource utilization of online clusters, significantly alleviates the pressure on business resources, and effectively reduces the cost of using offline resources.
In the experience of Xiaohongshu, the following key technical points are worth sharing:
• To address the bottleneck of disk performance caused by local shuffle, we employ RemoteShuffleService to reduce the I/O overhead on local disks. This approach improves I/O performance, increases the efficiency and stability of offline services, and prevents offline services from interfering with online services at the I/O level.
• Xiaohongshu participates in complex hybrid deployment scenarios, and besides big data Spark scenarios, there are other business scenarios such as transcoding, offline reasoning, and training. To ensure the stable runtime of high-priority Spark tasks, we have implemented fine-grained priority differentiation and policy optimization in YARN resource synchronization, standalone eviction policies, and QoS guarantee policies. These optimizations include offline resource overconsumption reporting, standalone conflict handling, eviction of low-priority offline services like transcoding during resource conflicts or insufficient offline resources, and differentiated QoS guarantee policies. These optimizations have led to stable and efficient operation of Spark tasks and improved resource utilization.
Xiaohongshu has successfully implemented the hybrid deployment solution on a large scale, resulting in the following business outcomes:
• Coverage of tens of thousands of online cluster nodes, providing hundreds of thousands of computing resources for offline businesses in a stable manner.
• Eviction rate of offline tasks below 1%, ensuring minimal disruption after the hybrid deployment.
• Average CPU utilization of hybrid deployment clusters increased by 8% to 10%, with some clusters achieving more than 45% CPU utilization. This significant improvement enhances the efficiency of cluster resource utilization.
With the continuous expansion of incremental business scenarios, these benefits are expected to further grow.
The relevant basic functions to support the hybrid deployment of Kubernetes and YARN have been developed. The Koordinator team is currently working hard to complete a series of preparations before the release. Please look forward to it!
If you are interested in participating in the cooperation and development of the project or are interested in the hybrid deployment of Kubernetes and YARN, please leave a message in the special discussion area of the community [4], and we will contact you as soon as possible.
Message format:
Contact (gihub-id/e-mail): e.g. @koordinator-dev
Name of the company/school/organization you work for/attend/participate in: e.g. koordinator community
Community participation intention: e.g., hope to participate in researching and developing/studying the hybrid deployment of big data and cloud-native/implement hybrid deployment functions of Kubernetes and YARN in the production environment/others.
What you expect from "the hybrid deployment of Kubernetes and YARN":
[1] Apache Hadoop YARN
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
[2] E-MapReduce
https://www.alibabacloud.com/help/en/emr/
[3] Design document
https://koordinator.sh/docs/next/best-practices/colocation-of-hadoop-yarn/
[4] Special discussion area
https://github.com/koordinator-sh/koordinator/discussions/1297
Official Release of IntelliJ IDEA and Apache Dubbo's IDEA Plug-in
508 posts | 48 followers
FollowAlibaba Cloud Native Community - December 7, 2023
Alibaba Cloud Native Community - December 1, 2022
Alibaba Cloud Native Community - September 18, 2023
Alibaba Cloud Native - March 5, 2024
Alibaba Cloud Native Community - July 19, 2022
Alibaba Cloud Native Community - August 15, 2024
508 posts | 48 followers
FollowAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreHighly reliable and secure deployment solutions for enterprises to fully experience the unique benefits of the hybrid cloud
Learn MoreA cost-effective, efficient and easy-to-manage hybrid cloud storage solution.
Learn MoreA low-code development platform to make work easier
Learn MoreMore Posts by Alibaba Cloud Native Community