×
Community Blog Alibaba Cloud PAI Paper Selected in SIGMOD 2023

Alibaba Cloud PAI Paper Selected in SIGMOD 2023

This short article discusses the importance of a recent paper selected by SIGMOD and what it means for PAI in the deep learning data processing direction.

Recently, the paper entitled GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning, jointly written by Alibaba Cloud Machine Learning Platform for AI (PAI) and Zhi Yang (from Peking University), was accepted by SIGMOD 2023. The training performance and cluster resource utilization efficiency are significantly improved through the elastic scaling of the deep learning data pre-processing pipeline.

SIGMOD is a top-level international conference in the database and data management systems field. Since its beginning in 1975, it has played a profound role in promoting the development of database technology and has had a huge influence in academic and industrial circles. SIGMOD pays attention to the intersection of data management systems and other directions, especially in recent years in the machine learning and artificial intelligence fields. The selection means that PAI in the deep learning data processing direction has reached the advanced level in the global industry and won international scholars' recognition.

In recent years, with the evolution of GPU accelerators and the emergence of various software optimization technologies, the computing efficiency of deep learning training has been upgraded to a new level. At the same time, deep learning is essentially a multi-stage and multi-resource task type. A lot of training computing is required on the GPU, and data pre-processing pipelines on the CPU side (such as data enhancement and feature conversion) are often required. Such pre-processing computing is a necessary step to train high-quality models. Therefore, the improvement of GPU-side training performance brings greater pressure on data pre-processing, making the latter a new performance bottleneck.

To address this problem, it is found that the data pre-processing pipeline has the characteristics of being stateless and has inherent resource elasticity. Based on this, GoldMiner separates the data pre-processing pipeline from the model training part, identifies stateless data pre-processing computing through automatic computing graph analysis, and implements efficient parallel acceleration and elastic scaling, alleviating data pre-processing bottlenecks and improving training performance. Through the collaborative design with the cluster scheduler, GoldMiner further exerts the resource elasticity of data pre-processing and computing, significantly improving the cluster scheduling efficiency. Experiments show that GoldMiner can improve training performance by up to 12.1 times and GPU cluster utilization by up to 2.5 times.

Currently, PAI is integrating GoldMiner with PAI-DLC to provide users with data pre-processing and acceleration capabilities. PAI provides lightweight and cost-effective cloud-native machine learning for enterprise customers and developers. It covers the entire process from PAI-DSW interactive modeling, PAI-Designer visual modeling, and PAI-DLC distributed training to the online deployment of PAI-EAS models. PAI-DLC provides a cloud-native and comprehensive deep-learning training platform and a flexible, stable, easy-to-use, and high-performance training environment for machine learning. It supports various algorithms, including large-scale distributed deep learning algorithms and custom algorithm frameworks. This helps lower costs and improve efficiency for developers and enterprises.

  • Paper Title: GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning
  • Authors: Hanyu Zhao, Zhi Yang, Yu Cheng, Chao Tian, Shiru Ren, Wencong Xiao, Man Yuan, Langshi Chen, Kaibo Liu, Yang Zhang, Yong Li, and Wei Lin
  • Paper Link: https://dl.acm.org/doi/pdf/10.1145/3589773
0 1 0
Share on

You may also like

Comments

Related Products