Machine Learning Platform for AI (PAI)-Blade allows you to optimize models in various ways. You need to only install wheel packages in your local environment. Then, you can optimize models by calling a Python method. This topic describes how to optimize a PyTorch model by using PAI-Blade. In this example, NVIDIA Tesla T4 GPUs are used.
Prerequisites
PyTorch is installed. The wheel packages of PAI-Blade are installed. For more information, see Install PAI-Blade.
PyTorch models are trained. In this example, an open ResNet50 model is used.
Optimize a PyTorch model
Import PAI-Blade and other dependency libraries.
import os import time import torch import torchvision.models as models import blade
Load a ResNet50 model from the torchvision library. PAI-Blade supports only ScriptModules. Therefore, the ResNet50 model must be converted into a ScriptModule.
model = models.resnet50().float().cuda() # Prepare the model. model = torch.jit.script(model).eval() # Convert the model into a ScriptModule. dummy = torch.rand(1, 3, 224, 224).cuda() # Construct test data.
Call the
blade.optimize
method to optimize the ResNet50 model. For more information about the parameters, see Python method. The following sample code provides an example on how to optimize the model. If you have questions during model optimization, you can join the DingTalk group of PAI-Blade users and consult the technical support staff. For more information, see Obtain an access token.optimized_model, opt_spec, report = blade.optimize( model, # The model to be optimized. 'o1', # The optimization level. Valid values: o1 and o2. device_type='gpu', # The type of the device on which the model is run. Valid values: gpu and cpu. test_data=[(dummy,)], # The test data. The test data used for a PyTorch model is a list of tuples of tensors. )
The
blade.optimize
method returns the following objects:optimized_model: the optimized model. In this example, a
torch.jit.ScriptModule
object is returned.opt_spec: the external dependencies that are required to reproduce the optimization results. The external dependencies include the configuration information, environment variables, and resource files. You can execute the
WITH
statement in Python to make the external dependencies take effect.report: the optimization report, which can be directly displayed. For more information about the parameters in the optimization report, see Optimization report.
During model optimization, the optimization progress is displayed. The following sample code provides an example:
[Progress] 5%, phase: user_test_data_validation. [Progress] 10%, phase: test_data_deduction. [Progress] 15%, phase: CombinedSwitch_4. [Progress] 95%, phase: model_collecting.
Display the optimization report.
print("Report: {}".format(report))
In the optimization report, you can view the optimization items that achieve optimization effects. The following sample code provides an example:
Report: { // ...... "optimizations": [ { "name": "PtTrtPassFp32", "status": "effective", "speedup": "1.50", // The acceleration ratio. "pre_run": "5.29 ms", // The latency before acceleration. "post_run": "3.54 ms" // The latency after acceleration. } ], // The end-to-end optimization results. "overall": { "baseline": "5.30 ms", // The latency of the original model. "optimized": "3.59 ms", // The latency of the optimized model. "speedup": "1.48" // The acceleration ratio. }, // ...... }
Compare the performance before and after model optimization.
@torch.no_grad() def benchmark(model, inp): for i in range(100): model(inp) start = time.time() for i in range(200): model(inp) elapsed_ms = (time.time() - start) * 1000 print("Latency: {:.2f}".format(elapsed_ms / 200)) # Measure the speed of the original model. benchmark(model, dummy) # Measure the speed of the optimized model. benchmark(optimized_model, dummy)
Extended information
When you call the blade.optimize
method, you can specify the model to be optimized for the model parameter in multiple ways. To optimize a PyTorch model, you can specify the model in one of the following ways:
Specify a
torch.jit.ScriptModule
object.Load a
torch.jit.ScriptModule
object from a model file saved by using thetorch.jit.save
method.
In this example, a torch.jit.ScriptModule
object in the memory is specified for the blade.optimize
method. The following sample code provides an example on how to load a model from a model file:
optimized_model, opt_spec, report = blade.optimize(
'path/to/torch_model.pt',
'o1',
device_type='gpu'
)
What to do next
After the model is optimized by using PAI-Blade, you can run the optimized model in Python or deploy the optimized model as a service in Elastic Algorithm Service (EAS) of PAI. PAI-Blade also provides an SDK for C++ to help you integrate the optimized model into your own application. For more information, see Use an SDK to deploy a PyTorch model for inference.