×
Community Blog Hyper parameter optimization of Convolutional Neural Networks to classify MNIST digit database

Hyper parameter optimization of Convolutional Neural Networks to classify MNIST digit database

I explored the cloud AI systems available with Alibaba Cloud. The Machine Learning Platform for AI sounds promising for anyone who wants to kickstart their projects in Cloud AI.

Being an academician, my research interests include Deep Learning and Internet of Things. I have worked on a lot of deep learning frameworks in Tensorflow and Keras in Jupyter Notebooks powered by GPUs and cloud based TPUs. With this expertise, I explored the cloud AI systems available with Alibaba Cloud. The Machine Learning Platform for AI sounds promising for anyone who wants to kickstart their projects in Cloud AI.
Capture
This machine learning platform is offering a lot of options. I got comfortable with the DSW Notebook Service. The machine learning platform has visual modelling platform for those who are not comfortable with programming too. I chose to go with the DSW Notebook Service.
Capture1
I read Jeremy's article on developing deep learning model to classify images from his blog published earlier. This article helps anyone to understand the DSW environment and to start coding for Deep Learning. I created a DSW instance in Kuala Lumpur region of Malaysian data center of Alibaba Cloud. The instance I created includes a Pay As You Go configuration of GPU with tensor flow support and 2 vCPUS. The user needs to wait for a couple of minutes inorder to start and stop the DSW instance.
blog_pic1
The source code used in this blog's demonstration can be found in the GitHub link. MNIST digits dataset consists of thousands of images consisting of handwritten digits from 0 to 9. A basic one layer CNN was tried. A dense layer with sigmoid activation function was used. The accuracy for this simple architecture was found to be around 92%.
Capture
Seaborn Library in Python is used to generate the confusion matrix and it is given below.
Capture
After observing this performance, the architecture was modified to have two dense layers. The first dense layer is powered by relu activation and the second one with sigmoid activation. This configuration when tried with the same number of epochs, the testing accuracy of the handwritten digits improved to the level of 97%.
Capture
With an observation of increase in accuracy, a higher architecture of VGG 16 was tried on this dataset. This architecture has almost 16 layers as shown in the figure below.
Capture
This architecture uses Stochastic Gradient Descent with a learning rate of 0.01 and kernel initializer of he_uniform. The cross entropy loss was set to categorical and the model is run on the same number of epochs as mentioned in the architectures mentioned above. The performance of testing accuracy increased to the level of 99%. The cross entropy loss decreased significantly and paved way for the testing accuracy to increase further.blog_pic2

S.No Architecture Testing Accuracy
1 Single Dense 92%
2 Double Dense 97%
3 VGG 16 99%

Overall, while using the DSW notebook service for Deep Learning, the GPU sounded dedicated and was faster than any other system I worked before. The right pane of DSW notebook shows the real time monitoring of CPU and GPU performance and it is impressive.

0 0 0
Share on

ferdinjoe

21 posts | 136 followers

You may also like

Comments

ferdinjoe

21 posts | 136 followers

Related Products