TensorFlowモデルの最適化

AI (PAI) のプラットフォーム-Bladeを使用すると、さまざまな方法でモデルを最適化できます。ホイールパッケージは、ローカル環境にのみインストールする必要があります。次に、Pythonメソッドを呼び出してモデルを最適化できます。このトピックでは、PAI-Bladeを使用してTensorFlowモデルを最適化する方法について説明します。この例では、NVIDIA Tesla T4 GPUが使用されます。

前提条件

PAI-BladeとTensorFlowのホイールパッケージがインストールされています。
TensorFlowモデルが訓練される。この例では、オープンTensorFlow ResNet50モデルが使用されています。

この例では、オープンTensorFlow ResNet50モデルが最適化されています。独自のTensorFlowモデルを最適化することもできます。

PAI-Bladeおよびその他の依存関係ライブラリをインポートします。
```
import os
import numpy as np
import tensorflow.compat.v1 as tf
import blade
```

最適化するモデルとテストデータのダウンロードに使用されるメソッドのコードを記述します。

PAI-Bladeはゼロ入力モデル最適化をサポートします。ただし、最適化結果の精度を確保するために、テストデータを入力してモデルを最適化することを推奨します。次のサンプルコードに例を示します。

def _wget_demo_tgz():
    # Download an open TensorFlow ResNet50 model. 
    url = 'http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/demo/mask_rcnn_resnet50_atrous_coco_2018_01_28.tar.gz'
    local_tgz = os.path.basename(url)
    local_dir = local_tgz.split('.')[0]
    if not os.path.exists(local_dir):
        blade.util.wget_url(url, local_tgz)
        blade.util.unpack(local_tgz)
    model_path = os.path.abspath(os.path.join(local_dir, "frozen_inference_graph.pb"))
    graph_def = tf.GraphDef()
    with open(model_path, 'rb') as f:
        graph_def.ParseFromString(f.read())
    # Use random numbers as test data. 
    test_data = np.random.rand(1, 800,1000, 3)
    return graph_def, {'image_tensor:0': test_data}

graph_def, test_data = _wget_demo_tgz()

ResNet50モデルを最適化するには、blade.optimizeメソッドを呼び出します。次のサンプルコードは、モデルを最適化する方法の例を示しています。

input_nodes=['image_tensor']
output_nodes = ['detection_boxes', 'detection_scores', 'detection_classes', 'num_detections', 'detection_masks']

optimized_model, opt_spec, report = blade.optimize(
    graph_def,                 # The model to be optimized. In this example, a tf.GraphDef object is specified. You can also set this parameter to the path in which the optimized model is stored. 
    'o1',                      # The optimization level. Valid values: o1 and o2. 
    device_type='gpu',         # The type of the device on which the model is run. Valid values: gpu, cpu, and edge. 
    inputs=input_nodes,        # The input node. This parameter is optional. If you do not specify this parameter, PAI-Blade automatically infers the input node. 
    outputs=output_nodes,      # The output node. 
    test_data=[test_data]      # The test data. 
)

blade.optimizeメソッドは、次のオブジェクトを返します。

optimized_model: 最適化されたモデル。この例では、tf.GraphDefオブジェクトが返されます。
opt_spec: 最適化結果を再現するために必要な外部依存関係。外部依存関係には、構成情報、環境変数、およびリソースファイルが含まれます。 PythonでWITHステートメントを実行して、外部依存関係を有効にすることができます。
report: 最適化レポート。直接表示できます。最適化レポートのパラメーターの詳細については、「最適化レポート」をご参照ください。

最適化中、次の例に示すように、最適化プロセスが表示されます。

[Progress] 5%, phase: user_test_data_validation.
[Progress] 10%, phase: test_data_deduction.
[Progress] 15%, phase: CombinedSwitch_1.
[Progress] 24%, phase: TfStripUnusedNodes_22.
[Progress] 33%, phase: TfStripDebugOps_23.
[Progress] 42%, phase: TfFoldConstants_24.
[Progress] 51%, phase: CombinedSequence_7.
[Progress] 59%, phase: TfCudnnrnnBilstm_25.
[Progress] 68%, phase: TfFoldBatchNorms_26.
[Progress] 77%, phase: TfNonMaxSuppressionOpt_27.
[Progress] 86%, phase: CombinedSwitch_20.
[Progress] 95%, phase: model_collecting.
[Progress] 100%, Finished!

最適化レポートを表示します。

print("Report: {}".format(report))

最適化レポートでは、次の例に示すように、最適化効果を実現する最適化項目を表示できます。

Report: {
  // ......
  "optimizations": [
    // ......
    {
      "name": "TfNonMaxSuppressionOpt",
      "status": "effective",
      "speedup": "1.58",        // The acceleration ratio. 
      "pre_run": "522.74 ms",   // The latency before the optimization. 
      "post_run": "331.45 ms"   // The latency after the optimization. 
    },
    {
      "name": "TfAutoMixedPrecisionGpu",
      "status": "effective",
      "speedup": "2.43",
      "pre_run": "333.30 ms",
      "post_run": "136.97 ms"
    }
    // ......
  ],
  // The end-to-end optimization results. 
  "overall": {
    "baseline": "505.91 ms",    // The latency of the original model. 
    "optimized": "136.83 ms",   // The latency of the optimized model. 
    "speedup": "3.70"           // The acceleration ratio. 
  },
  // ......
}

最適化の前後のモデルのパフォーマンスを比較します。

import time

def benchmark(model):
    tf.reset_default_graph()
    with tf.Session() as sess:
        sess.graph.as_default()
        tf.import_graph_def(model, name="")
        # Warmup!
        for i in range(0, 1000):
            sess.run(['image_tensor:0'], test_data)
        # Benchmark!
        num_runs = 1000
        start = time.time()
        for i in range(0, num_runs):
            sess.run(['image_tensor:0'], test_data)
        elapsed = time.time() - start
        rt_ms = elapsed / num_runs * 1000.0
        # Show the result!
        print("Latency of model: {:.2f} ms.".format(rt_ms))

# Test the latency of the original model. 
benchmark(graph_def)

# Test the latency of the optimized model. 
with opt_spec:
    benchmark(optimized_model)

次の例に示すように、テスト結果が返されます。テスト結果は、最適化レポートの情報を反映します。

モデルの

Latency of model: 530.26 ms.
Latency of model: 148.40 ms.

拡張情報

blade.optimizeメソッドを呼び出すと、modelパラメーターに最適化するモデルを複数の方法で指定できます。 TensorFlowモデルは、次のいずれかの方法で指定できます。

tf.GraphDefオブジェクトを指定します。
からfrozen.pbモデルを読み込みます。pbまたは。pbtxtファイル。
最適化モデルを格納するパスを指定します。

この例では、第1の方法が使用される。メモリ内のtf.GraphDefオブジェクトは、blade.optimizeメソッドに指定されます。次のサンプルコードは、他の2つの方法の例を示します。

からfrozen.pbモデルを読み込みます。pbまたは。pbtxtファイル。

optimized_model, opt_spec, report = blade.optimize(
    './path/to/frozen_pb.pb',  # You can also load a .pbtxt file. 
    'o1',
    device_type='gpu',
)

最適化モデルを格納するパスを指定します。

optimized_model, opt_spec, report = blade.optimize(
    './path/to/saved_model_directory/',
    'o1',
    device_type='gpu',
)

次のステップ

PAI-Bladeを使用してモデルを最適化した後、Pythonで最適化モデルを実行するか、PAIのElastic Algorithm service (EAS) で最適化モデルをサービスとしてデプロイできます。 PAI-Bladeは、最適化されたモデルを独自のアプリケーションに統合するのに役立つC ++ 用のSDKも提供しています。