Artificial intelligence (AI) has deeply influenced all walks of life and is the mainstream approach to AI implementation. The demand for computing power in deep learning is enormous and dynamic, and data migration to the cloud has become the mainstream trend.
GPU is an important source of AI computing power. Internet customers and traditional enterprise customers who have AI-related services need to rent GPU cloud servers for deep learning model training and inference.
With the continuous development of graphics card technology and the progress of semiconductor technology, the computational power of a single GPU card is rising and the cost is increasing. However, many deep learning tasks do not occupy a single GPU card. The inflexible resource scheduling results in low GPU resource usage.
In this case, scheduling underlying GPU resources by using containers becomes a good solution. Multiple tenants (VMs) use the same GPU card, which can be implemented by using vGPU technology. Single-tenancy and multi-threaded scenarios can be implemented by using GPU container sharing technology. GPU resources can be split in finer granularity to improve resource usage by deploying high-density containers on GPU cards.
Alibaba Cloud heterogeneous computing team launched the cGPU container sharing technology, which allows users to employ containers to schedule the underlying GPU resources in a more fine-grained manner. This technology can improve GPU resource usage, boost efficiency, and reduce costs.
GPU containers are commonly used in the industry. When a container schedules a GPU, container applications in different threads may compete for memory resources and affect each other, resulting in complete container isolation. For example, applications that require memory resources may occupy too many resources, resulting in insufficient memory resources for container applications running in another thread. In other words, the computing preemption problem is only solved, but faults are not isolated. For example, an enterprise runs a GPU inference application in two containers. One is stable, and the other is still in the development phase. If the application in one container fails, the application in the other container may also fail because no isolation technology is implemented.
Another improvement method used in the industry replaces or adjust the CUDA runtime library. one shortcoming of this method is that users cannot integrate the user-created environments with the environments of cloud vendors. Users must adapt and change the CUDA runtime library.
The cGPU container technology launched by Alibaba Cloud can securely isolate containers, avoid mutual interference between services, and prevent errors between containers from spreading, and enhance security and stability. This technology allows users to flexibly use containers to schedule the underlying GPU resources without modifying the CUDA runtime library.
The cGPU container technology will push more enterprises to use containers to schedule the underlying GPU container resources, improve GPU resource usage, boost efficiency, and reduce costs.
Alibaba Cloud ACK Pro and ACK@Edge: Cloud-Native Evolution for Enterprises
The Next-Generation OS for Enterprises: The Launch of the Exclusive DingTalk Solution
31 posts | 5 followers
FollowPM - C2C_Yuan - May 31, 2024
JJ Lim - April 19, 2023
ferdinjoe - September 5, 2023
Alibaba Cloud Community - January 29, 2023
Alibaba Cloud Serverless - July 14, 2023
JJ Lim - April 13, 2023
31 posts | 5 followers
FollowPowerful parallel computing capabilities based on GPU technology.
Learn MoreOffline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreA secure image hosting platform providing containerized image lifecycle management
Learn MoreMore Posts by Alibaba Cloud New Products