By Nikesh Gogia, Solution Architect
This tutorial is targeted for any organization that wants to host TensorFlow on Alibaba Cloud using Docker containers. This document can be used by Solution Architects or Business Development teams for proof of concept (POC) of any customer requirement and gradually can be converted into production grade hosting.
We will be running our Docker containers on an Alibaba Cloud Elastic Compute Service (ECS) Ubuntu 16.04 64 bit 4 Core 8 GB RAM Virtual Machine.
TensorFlow is an open source library for numerical computation, specializing in machine learning applications. In this tutorial, you will learn how to install and run TensorFlow on a single machine, and will train a simple classifier to classify images of flowers.
In this lab, we will be using transfer learning, which means we are starting with a model that has been already trained on another problem. We will then be retraining it on a similar problem. Deep learning from scratch can take days, but transfer learning can be done in a much short order.
This lab will train the model of lights on and off based on my demo that I released on YouTube. We will use this same model, but retrain it to tell apart a small number of classes based on our own examples.
What you will learn:
What you need:
The Docker installation package available in the official Ubuntu 16.04 repository may not be the latest version. To get the latest and greatest version, install Docker from the official Docker repository. This section shows you how to do just that.
First, add the GPG key for the official Docker repository to the system:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
Add the Docker repository to APT sources:
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release-cs) stable"
Next, update the package database with the Docker packages from the newly added repo:
sudo apt-get update
Make sure you are about to install from the Docker repo instead of the default Ubuntu 16.04 repo:
apt-cache policy docker-ce
You should see output similar to the follow:
Output of apt-cache policy docker-ce
docker-ce:
Installed: (none)
Candidate: 17.03.1~ce-0~ubuntu-xenial
Version table:
17.03.1~ce-0~ubuntu-xenial 500
500 https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
17.03.0~ce-0~ubuntu-xenial 500
500 https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
Notice that docker-ce is not installed, but the candidate for installation is from the Docker repository for Ubuntu 16.04. The docker-ce version number might be different.
Finally, install Docker:
sudo apt-get install -y docker-ce
Docker should now be installed, the daemon started, and the process enabled to start on boot. Check that it's running:
sudo systemctl status docker
The output should be similar to the following, showing that the service is active and running:
Output
docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2016-05-01 06:53:52 CDT; 1 weeks 3 days ago
Docs: https://docs.docker.com
Main PID: 749 (docker)
In order to set the environment variables and quick setup of TensorFlow, I have created docker image of TensorFlow and you can pull it fast.
In above Ubuntu image of Alibaba Cloud, after installing Container and docker, execute following steps.
$ docker pull nikeshgogia/tensorflow:1.0
$ docker images
Once you do the following, you will see below output when you run docker images and docker ps –a commands.
root@iZt4neefbpoojkuy4fdvqzZ:~# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
nikeshgogia/tensorflow 1.0 ed0ee8133d06 26 minutes ago 1.62GB
Congratulations! You have successfully configured Docker image of TensorFlow on your Alibaba Cloud VM Container.
Once you have completed pulling the image, execute following steps.
First execute mkdir tf_files
in / path. Execute below command
docker run -it --publish 6006:6006 -p 80:5000 --volume ${HOME}/tf_files:/tf_files --workdir
/tf_files nikeshgogia/tensorflow:1.0 bash
You will be now in container prompt of TensorFlow as shown below
root@17e62932b5b5:/tf_files#
Execute below steps to train your lights model.
Execute cd ..
command so that you come out of tf_files folder.
Execute following command cp -a btf_files/. tf_files/
Now enter into tf_files and see list of files.
root@17e62932b5b5:/tf_files# ls -l
total 16
drwxr-xr-x 4 root root 4096 Nov 16 03:37 lights
drwxr-xr-x 2 root root 4096 Nov 16 03:37 testimages
drwxr-xr-x 6 root root 4096 Nov 16 03:37 tf
In above folder, I have already put some of the lights on and off images under lights folder. There is a testimages folder which contains sample image which we will use to test after training the model. Folder tf is the scripts of tensorlfow
Create directory /trained_files by executing mkdir trained_
files (Make sure it is in / path)
Execute following command to train your model.
Set variable path as below
IMAGE_SIZE=224
ARCHITECTURE="mobilenet_0.50_${IMAGE_SIZE}"
Once you set above command, make sure you enter into cd /tf_files/tf and then execute below command.
python -m scripts.retrain \
--bottleneck_dir=/trained_files/bottlenecks \
--how_many_training_steps=500 \
--model_dir=/trained_files/models/ \
--summaries_dir=/trained_files/training_summaries/"${ARCHITECTURE}" \
--output_graph=/trained_files/retrained_graph.pb \
--output_labels=/trained_files/retrained_labels.txt \
--architecture="${ARCHITECTURE}" \
--image_dir=/tf_files/lights
Once you train the model, test by giving sample image
python -m scripts.label_image \
--graph=/trained_files/retrained_graph.pb \
--image=/tf_files/testimages/l1.jpeg
You should see the following output:
light on 0.999999
light off 1.37755e-06
This concludes that image you provided was of light on. You can test multiple images by putting sample in testimages folder. Make sure you execute python scrip by entering into folder tf.
In this tutorial, we have successfully run TensorFlow on Alibaba Cloud using Docker containers for a sample machine learning application.
TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.
TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.
2,599 posts | 762 followers
FollowAlibaba F(x) Team - December 10, 2020
Alibaba Container Service - March 10, 2020
Alibaba Container Service - July 16, 2019
Alibaba Clouder - October 15, 2019
Alibaba Container Service - April 28, 2020
Alibaba Clouder - January 13, 2021
2,599 posts | 762 followers
FollowElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreLearn More
Conduct large-scale data warehousing with MaxCompute
Learn MoreMore Posts by Alibaba Clouder
Raja_KT March 7, 2019 at 6:33 am
Good one. Will the images be compatible for both NVIDia and AMD? I got an error while executing..."sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release-cs) stable"....lsb_release-cs not found...