By Yun Long and Gu Zhen
On December 25, 2018, Stanford University released the latest DAWNBench deep learning inference rankings. Alibaba Cloud ranked first in terms of image recognition performance and cost, breaking the 8-month streak of the Amazon Web Services (AWS) computing platform. Alibaba Cloud was the first Chinese tech company to appear in these rankings.
Alibaba Cloud's technical team tops the ranking in inference performance and cost by using an ecs.gn5i-c8g1.2xlarge instance with a 4.218 ms performance (see Figure 2) and a cost of 0.00000154 USD (see Figure 4) per image. The inference performance is 2.36 times that of the runner-up Amazon EC2 [c5.18xlarge] instance, and the inference cost per image is 6.1% less than that of the runner-up.
Figure 1. Image recognition diagram
Though DNN performance optimization remains a hot R&D topic in the academia and industry, no end-to-end evaluation standards had been proposed for deep learning training and inference tasks until the DAWNBench competition was launched. The DAWNBench competition was the first to focus on performance metrics, model accuracy, and cost. The DAWNBench competition has attracted a great deal attention from the industry since it was launched by Stanford University at the 2017 NIPS conference.
Alibaba Cloud took part in two projects at the DAWNBench competition. The first project was to verify a classification task covering 50,000 images based on ImageNet. The top 5 accuracy of the classification model could be no less than 93%, and we had to collect statistics on the average delay for classifying each image. The shorter the delay, the higher the performance and ranking. The second project was to calculate the average inference cost of 50,000 images by looking at the inference cost per image.
Figure 2. Inference performance rankings of the DAWNBench competition (as of December 25, 2018)
Figure 3. Inference cost rankings of the DAWNBench competition (as of December 25, 2018)
Figure 4. Inference cost of Alibaba Cloud in the DAWNBench competition
Figures 2 and 3 show the rankings of the two projects as of December 25, 2018. As shown in Figures 2 and 3, Alibaba Cloud tops the rankings for both of the projects. To achieve the fastest performance and lowest cost, the participating team made optimizations in the following three aspects: deep learning model selection, 8-bit quantitative optimization, and Alibaba Cloud GPU instance selection.
Before the Alibaba Cloud team participated in the competition, the ResNet50 model topped the rankings of the ImageNet inference task. This model is based on the Amazon EC2 [c5.18xlarge] instance and achieved an inference performance of 9.96 ms and an average inference cost of 1.64E-06 USD. This model comes from the "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour" paper on the ImageNet training completed by Facebook within 1 hour. The model is called ResNet50-v2, and the original ResNet50 model is called ResNet50-v1. Though ResNet50-v2 is easier to train, its training computing workload increases by about 12% and its inference computing workload increases by about 6%. For the inference task, any reduction in the computing workload is valuable, so long as the accuracy meets the standard. Based on this fact, the Alibaba Cloud team selected the ResNet50-v1 model.
Figure 5. Learning rate setting during ResNet50 model training
It is difficult to achieve 93% top 5 accuracy by using the classic three-segment format during ReseNet50-v1 model training. The Alibaba Cloud team traversed the hyperparameter space to improve accuracy, but it was still difficult to improve the top 5 accuracy of the ReseNet50-v1 model above 93% with any consistency. To solve the problem, the team designed the learning rate format shown in Figure 5. The learning rate increases linearly to the peak value in the early stage of training and then attenuates linearly afterwards. A ResNet50-v1 model with a top 5 accuracy of 93.28% is obtained based on the learning rate.
Low-bit quantization is one of the main ways to improve inference performance. Though ResNet network inference is studied by using bit 1 or bit 2, quantization of ultra-low accuracy results in substantial accuracy loss. In contrast, the Alibaba Cloud team adopted Int8 quantization, which improves computing performance and maintains the prediction accuracy of the model.
The Alibaba Cloud team selects the popular TensorFlow deep learning framework for optimization, so that Alibaba Cloud customers can have access to the optimization results. Int8 quantization is based on TensorRT. The difficulty of optimization lies in quantizing the well-trained TensorFlow model into the TensorRT Int8 model and loading the quantized TensorRT model to the TensorFlow computing diagram for inference.
The team then performed deep optimization based on the benchmark code of TensorFlow. The Kullback-Leibler divergence before and after quantization is calculated during Int8 quantization to calibrate the dynamic range of activated values at each layer of the neural network. The Alibaba Cloud team completes calibration in three phases: (1) create an Int8 quantization model; (2) calibrate the quantization model; (3) generate the optimized Int8 model based on the calibration results. The team then optimized the benchmark inference mode to import the optimized inference engine.
The Alibaba Cloud team selected the NVIDIA Tesla P4 GPU with support for 8b computing and the Alibaba Cloud ecs.gn5i-c8g1.2xlarge instance based on the GPU. The instance contains an octa-core vCPU and a P4 GPU. The instance provides three billing methods: Subscription, Pay-As-You-Go, and Preemptible Instance. In Preemptible Instance mode, the hourly price per instance is only 7.015 RMB.
GPU | Latency (ms) | Top 5 accuracy |
Tesla P4 | 4.218 | 93.16% |
Table 1. Average inference performance and accuracy of Alibaba Cloud ecs.gn5i-c8g1.2xlarge instance
Table 1 lists the optimization results of the Alibaba Cloud ecs.gn5i-c8g1.2xlarge instance for the ImageNet inference task during the DAWNBench competition. As shown in Table 1, the average inference performance per image is 4.218 ms, which is 2.36 times that of the runner-up Amazon EC2 [c5.18xlarge] instance. In Pay-As-You-Go mode, the cost is 1.54E-06 USD, which is 6.1% less than that of the runner-up. In Preemptible Instance mode, the cost is down to 1.23E-06 USD, which is 26.2% less than that of the runner-up. The inference accuracy when processing 50,000 images is 93.16%, surpassing the required accuracy of the ImageNet inference task.
The optimization results can be applied to the ResNet and Inception models widely used by the visual tasks of computers, and are integrated into the acceleration framework (Perseus) of the Alibaba Cloud GPU computing platform, using images to improve the experience of GPU users. Alibaba Cloud is building a full-stack heterogeneous computing service platform that optimizes virtualization, storage, GPU acceleration, and deep learning frameworks.
To learn more about Alibaba Cloud Elastic GPU Service, visit https://www.alibabacloud.com/product/gpu
Ushering in the Era of Serverless Containers with Elastic Container Instance (ECI)
33 posts | 12 followers
FollowAlibaba Clouder - October 16, 2019
Alibaba Clouder - November 25, 2020
Alibaba Clouder - December 3, 2019
Alibaba Clouder - June 10, 2020
Alibaba Clouder - November 30, 2018
Alibaba Clouder - August 10, 2018
33 posts | 12 followers
FollowA platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreThis technology can be used to predict the spread of COVID-19 and help decision makers evaluate the impact of various prevention and control measures on the development of the epidemic.
Learn MoreThis solution enables you to rapidly build cost-effective platforms to bring the best education to the world anytime and anywhere.
Learn MorePowerful parallel computing capabilities based on GPU technology.
Learn MoreMore Posts by Alibaba Cloud ECS