Deploy a PyTorch deep learning model on a seventh-generation security-enhanced instance - Elastic Compute Service

Background information

Artificial intelligence (AI) models are built on large amounts of training data and high computing power and are an extremely valuable form of intellectual property. PyTorch is widely recognized by AI developers for its flexible and dynamic programming environment, dynamic graph mechanism, and flexible networking architecture. In most cases, PyTorch models are deployed on cloud servers, such as Alibaba Cloud Elastic Compute Service (ECS) instances. All PyTorch model owners and cloud service providers must maintain the availability of the PyTorch models deployed on the public cloud and ensure that the models are invisible and protected from theft.

Specific security-enhanced ECS instances provide encrypted computing capabilities based on Intel^® SGX to create a hardware-level trusted confidential environment that provides a high level of security. This ensures that code and data remain confidential, reliable, and protected against malware attacks.

You can deploy PyTorch deep learning models in the trusted confidential environment of security-enhanced ECS instances to ensure the security of data transmission and data usage and the integrity of PyTorch deep learning applications.

Architecture

Figure 1. Architecture

The parameters of the SGX-based PyTorch end-to-end security model are shown in the Architecture figure. The model is stored in ciphertext at the deployment phase. Related operations are performed within the SGX enclave. The model parameters are decrypted only within the SGX enclave, and the keys are transmitted by using the secure remote attestation channel.

This practice involves three roles: dkeyserver, dkeycache, and PyTorch with SGX. The workflow of the roles is illustrated in the Procedure figure.

dkeyserver: the key server, which is deployed on-premises to the PyTorch model user. The PyTorch model user encrypts the PyTorch model parameters by using the tools provided by PyTorch with SGX and then builds the on-premises key server dkeyserver. Then, the encrypted model is transmitted and deployed on the Alibaba Cloud SGX-based security-enhanced instance. The key server manages all model keys and model IDs and receives key requests from the key distribution service of the SGX-based security-enhanced instance.
dkeycache: the key distribution service, which is deployed on the SGX-based security-enhanced instance. The key distribution service of the SGX-based security-enhanced instance requests all model keys from the key server. After the key server completes the SGX remote attestation, the server uses the secure remote attestation channel to send the keys to the SGX enclave distributed by the key distribution service of the SGX-based security-enhanced instance. This operation is automatically completed after the key distribution server is started.
PyTorch with SGX: the SGX-based security-enhanced instance that runs PyTorch (PyTorch instance), which is deployed on the same server as dkeycache. When the PyTorch instance uses models to make predictions or perform classification tasks for model inference, the PyTorch instance automatically sends a request for a model key to the key distribution service. The key is encrypted and sent to the SGX enclave of the PyTorch instance by using the SGX secure channel. The enclave started by PyTorch with SGX uses the key to decrypt the model parameters and perform model prediction operations. Model parameters are protected by SGX-based hardware throughout the process and are available but invisible, which ensures the security of data transmission and data usage.

Figure 2. Procedure

Prerequisites

Prepare the environment in which you want to deploy the PyTorch deep learning model.

Note

In this practice, dkeyserver, dkeycache, and PyTorch with SGX are deployed on the same security-enhanced instance for easy verification.

Create a security-enhanced instance.
For more information, see Create a trusted instance. Take note of the following parameters:
- Image: Select Alibaba Cloud Linux 2.1903 LTS 64-bit (UEFI).
- Public IP Address: Select Assign Public IPv4 Address.
Build an SGX encrypted computing environment.
For more information, see Build an SGX confidential computing environment.
Install Python 3 and configure environment variables.
In this example, Python 3.6 is used. You can install another version of Python 3 based on your business requirements. For more information, visit the official Python website.

Install the packages required to run PyTorch.

PyTorch has version requirements for software such as Python and GCC. Run the following commands to install the specified software versions:

sudo yum update --skip-broken
sudo yum install -y teesdk git gcc-c++ scl-utils alinux-release-experimentals  python36-devel libbsd-devel
sudo yum install -y devtoolset-7-gcc devtoolset-7-gdb devtoolset-7-binutils devtoolset-7-make devtoolset-7-gcc devtoolset-7-gcc-c++
scl -l devtoolset-7
sudo ln -sf /opt/rh/devtoolset-7/root/bin/g++ /usr/bin/g++
sudo ln -sf /opt/rh/devtoolset-7/root/bin/gcc /usr/bin/gcc
sudo ln -sf /opt/rh/devtoolset-7/root/bin/c++ /usr/bin/c++
sudo ln -sf /usr/bin/python3 /usr/bin/python

The following command output indicates that the packages are installed.

安装所需的软件包

Run the following commands to install the PyTorch dependency library, encryption and decryption dependency library, and CMake:

sudo pip3 install --upgrade pip
sudo pip3 install astunparse numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses setuptools_rust pycryptodomex pycryptodome torchvision
sudo ln -sf /usr/local/bin/cmake /usr/bin/cmake
sudo ln -sf /usr/local/bin/cmake /bin/cmake

Procedure

Log on to the ECS instance.
For more information, see Connect to a Linux instance by using a password or key.
Switch to the working directory, such as /home/test, and obtain the sample PyTorch code.
The sample code contains code of dkeyserver, dkeycache, and PyTorch with SGX.
```
cd /home/test
git clone https://github.com/intel/sgx-pytorch -b sgx pytorch
cd /home/test/pytorch
git submodule sync && git submodule update --init --recursive
```
If an explicit_bzero error occurs, you can add the following patch and try again:
```
git pull origin pull/15/head
```

Compile PyTorch with SGX on the SGX-based security-enhanced instance.

Compile oneAPI Deep Neural Network Library (oneDNN).
oneDNN is an open source cross-platform performance library of basic building blocks for deep learning applications. The library is optimized for Intel Architecture processors, Intel Processor Graphics, and Xe Graphics. oneDNN is suitable for developers of deep learning applications and models who want to improve application performance on Intel CPUs and GPUs.
```
source /opt/alibaba/teesdk/intel/sgxsdk/environment
cd /home/test/pytorch/third_party/sgx/linux-sgx
git am ../0001*
cd external/dnnl
make
sudo cp sgx_dnnl/lib/libsgx_dnnl.a /opt/alibaba/teesdk/intel/sgxsdk/lib64/libsgx_dnnl2.a
sudo cp sgx_dnnl/include/* /opt/alibaba/teesdk/intel/sgxsdk/include/
```
Compile the PyTorch enclave.
The enclave of PyTorch with SGX performs model parameter decryption and model prediction operations.
```
source /opt/alibaba/teesdk/intel/sgxsdk/environment
cd /home/test/pytorch/enclave_ops/ideep-enclave
make
```

Compile PyTorch.

cd /home/test/pytorch
pip3 uninstall torch    #Uninstall the installed PyTorch. The self-compiled PyTorch will be installed.
source /opt/alibaba/teesdk/intel/sgxsdk/environment
python setup.py develop --cmake-only
sudo python setup.py develop && python -c "import torch"

编译PyTorch.png

Compile the secure PyTorch computing operator.

source /opt/alibaba/teesdk/intel/sgxsdk/environment
cd /home/test/pytorch/enclave_ops/secure_op && mkdir build && cd build
cmake -DCMAKE_PREFIX_PATH="$(python -c 'import torch.utils; print(torch.utils.cmake_prefix_path)')" ..
make

编译PyTorch安全计算运算符

Compile and generate the dkeyserver executable file on the key server and the dkeycache executable file on the SGX-based security-enhanced instance.
```
cd /home/test/pytorch/enclave_ops/deployment
make
```
Start the key service on the key server.
```
cd /home/test/pytorch/enclave_ops/deployment/bin/dkeyserver
sudo ./dkeyserver
```
The key server starts and waits for key requests from the dkeycache service deployed on the SGX-based security-enhanced instance.
Compile dkeycache on the SGX-based security-enhanced instance and start the key distribution service.
```
cd /home/test/pytorch/enclave_ops/deployment/bin/dkeycache
sudo ./dkeycache
```
After startup, dkeycache requests all model keys from dkeyserver. After dkeyserver completes the SGX remote attestation, dkeyserver sends the keys to the SGX enclave of dkeycache by using the secure remote attestation channel.
Run ResNet-based test cases on the SGX-based security-enhanced instance.
```
cd /home/test/pytorch/enclave_ops/test
sudo python whole_resnet.py
```
The ciphertext parameters of the PyTorch model are decrypted in the SGX enclave. The keys are obtained from dkeycache and then encrypted and transmitted to the enclave.