視頻產生解決方案 - Platform For AI

EasyAnimate是阿里雲PAI平台自主研發的DiT的視頻產生架構，它提供了完整的高清長視頻產生解決方案，包括視頻資料預先處理、VAE訓練、DiT訓練、模型推理和模型評測等，可用於文生視頻和圖生視頻推理。本文為您介紹如何在PAI平台整合EasyAnimate並一鍵完成模型推理、微調及部署的實踐流程。

背景資訊

本文為您介紹以下兩種視頻產生的方式：

方式一：使用DSW
DSW是為演算法開發人員量身打造的一站式AI開發平台，整合了JupyterLab、WebIDE、Terminal多種雲端開發環境，其中，Gallery提供了豐富的案例和解決方案，協助您快速熟悉研發流程。您可以開啟DSW Gallery中的案例教程，實現一鍵式運行Notebook，完成基於EasyAnimate的視頻產生模型的推理和訓練任務，也可以進行模型推理和微調等二次開發操作。
方式二：使用快速開始
快速開始整合了眾多AI開源社區中優質的預訓練模型，並且基於開源模型支援零代碼實現從訓練到部署再到推理的全部過程，您可以通過快速開始一鍵部署EasyAnimate模型並產生視頻，享受更快、更高效、更便捷的AI應用體驗。

費用說明

使用DSW會產生DSW和EAS相應費用，使用快速開始會產生DLC和EAS相應費用。

前提條件

建立PAI工作空間。具體操作，請參見開通PAI並建立預設工作空間。
（可選）開通OSS服務。

方式一：使用DSW

步驟一：建立DSW執行個體

進入DSW頁面。
1. 登入PAI控制台。
2. 在概覽頁面選擇目標地區。
3. 在左側導覽列單擊工作空間列表，在工作空間列表頁面中單擊目標工作空間名稱，進入對應工作空間內。
4. 在工作空間頁面的左側導覽列選擇模型開發與訓練>互動式建模（DSW），進入DSW頁面。
單擊建立執行個體。

在建立執行個體嚮導頁面，配置以下關鍵參數，其他參數保持預設即可。

參數	說明
執行個體名稱	本教程使用的樣本值為：AIGC_test_01
資源類型	選擇公用資源。
資源規格	選擇GPU規格下的ecs.gn7i-c8g1.2xlarge，或其他A10、GU100規格。
鏡像	選擇官方鏡像的easyanimate:1.1.5-pytorch2.2.0-gpu-py310-cu118-ubuntu22.04。
資料集掛載（可選）	單擊自訂資料集，選擇已建立的OSS或NAS資料集。如何建立資料集，請參見建立及管理資料集。

單擊確定。

步驟二：下載EasyAnimate模型

單擊目標DSW執行個體操作列下的開啟，進入DSW執行個體的開發環境。
在Notebook頁簽的Launcher頁面，單擊快速開始地區Tool下的DSW Gallery，開啟DSW Gallery頁面。
在DSW Gallery頁面中，搜尋基於EasyAnimate的AI視頻產生樣本，單擊在DSW中開啟，即可自動將本教程所需的資源和教程檔案下載至DSW執行個體中，並在下載完成後自動開啟教程檔案。
下載EasyAnimate相關代碼和模型並進行安裝。
在教程檔案easyanimate.ipynb中，單擊運行環境安裝節點命令，包括定義函數、下載代碼和下載模型。當成功運行一個步驟命令後，再順次運行下個步驟的命令。

步驟三：推理模型

單擊運行模型推理>UI啟動節點的命令，進行模型推理。
單擊產生的連結，進入WebUI介面。
在WebUI介面選擇預訓練模型路徑，其它參數按需配置即可。
（可選）如果您希望使用圖生視頻功能，可以在Image to Video地區上傳對應圖片，用於產生視頻。
單擊Generate（產生），等待一段時間後，即可在右側查看或下載產生的視頻。

步驟四：微調LoRA

EasyAnimate提供了豐富的模型訓練方式，包括DiT模型的訓練（LoRA微調和基模型的全量微調）和VAE的訓練。關於Gallery中內建的LoRA微調部分，更多資訊，請參見EasyAnimate。

準備資料

單擊執行模型訓練>資料準備節點的命令，即可下載樣本資料，用於模型訓練。您也可以按照如下格式要求自行準備資料檔案。

檔案資料格式如下。

project/
├── datasets/
│   ├── internal_datasets/
│       ├── videos/
│       │   ├── 00000001.mp4
│       │   ├── 00000002.mp4
│       │   └── .....
│       └── json_of_internal_datasets.json

其中，JSON檔案資料格式和參數說明如下。

[
    {
      "file_path": "videos/00000001.mp4",
      "text": "A group of young men in suits and sunglasses are walking down a city street.",
      "type": "video"
    },
    {
      "file_path": "videos/00000002.mp4",
      "text": "A notepad with a drawing of a woman on it.",
      "type": "video"
    }
    .....
]

參數	說明
file_path	視頻/圖片資料的存放位置（相對路徑）。
text	資料的文本描述。
type	視頻為`video`，圖片為`image`。

啟動訓練

（可選）如果您使用自行準備的資料檔案進行微調，需要將對應的訓練指令碼中的DATASET_NAME及DATASET_META_NAME設定為訓練資料所在目錄及訓練檔案地址。
```
export DATASET_NAME=“” # 訓練資料所在目錄
export DATASET_META_NAME=“datasets/Minimalism/metadata_add_width_height.json” # 訓練檔案地址
```
單擊執行啟動訓練>LoRA訓練節點的命令。
訓練完成後，單擊執行LoRA模型推理節點的命令，將訓練好的模型移動至EasyAnimate/models/Personalized_Model檔案夾。
單擊產生的連結，進入WebUI介面，選擇訓練好的LoRA模型產生視頻。

方式二：使用快速開始

快速開始作為PAI的產品組件，整合了眾多AI開源社區中優質的預訓練模型，並且基於開源模型支援零代碼實現從訓練到部署再到推理的全部過程。您可以直接部署模型並使用，也可以根據實際需求微調訓練模型後部署使用。

情境一：直接部署模型

進入快速開始頁面。
1. 登入PAI控制台。
2. 在左側導覽列單擊工作空間列表，在工作空間列表頁面單擊目標工作空間名稱，進入對應工作空間。
3. 在左側導覽列單擊快速開始，進入快速開始頁面。
在快速開始頁面，搜尋EasyAnimate 高清長視頻產生，然後單擊部署，配置相關參數。
EasyAnimate目前僅支援使用bf16進行推理，請選擇A10及其以上的顯卡。
單擊部署，在彈出的計費提醒對話方塊中，單擊確定，頁面將自動跳轉到服務詳情頁面。
當狀態變為運行中時，表示模型部署成功。

模型部署完成後，您可以使用WebUI及API兩種方式調用服務來產生視頻。

WebUI方式

在服務詳情頁面，單擊查看WEB應用。
在WebUI介面選擇預訓練的模型路徑，其它參數按需配置即可。
單擊Generate（產生），等待一段時間後，即可在右側查看或下載產生的視頻。

API方式

在服務詳情頁面的資源詳情地區，單擊查看調用資訊，擷取調用服務所需的資訊。

通過介面更新Transformer模型，可在DSW執行個體或本地Python環境中執行。

如果已經在WebUI中選擇模型，則無需發送請求重複調用。如遇請求逾時，請在EAS日誌中確認模型已載入完畢。載入完成，日誌中將提示Update diffusion transformer done。

Python請求樣本如下。

import json
import requests


def post_diffusion_transformer(diffusion_transformer_path, url='http://127.0.0.1:7860', token=None):
    datas = json.dumps({
        "diffusion_transformer_path": diffusion_transformer_path
    })
    head = {
        'Authorization': token
    }
    r = requests.post(f'{url}/easyanimate/update_diffusion_transformer', data=datas, headers=head, timeout=15000)
    data = r.content.decode('utf-8')
    return data

def post_update_edition(edition, url='http://0.0.0.0:7860',token=None):
    head = {
        'Authorization': token
    }

    datas = json.dumps({
        "edition": edition
    })
    r = requests.post(f'{url}/easyanimate/update_edition', data=datas, headers=head)
    data = r.content.decode('utf-8')
    return data
  
if __name__ == '__main__':
    url = '<eas-service-url>'
    token = '<eas-service-token>'

    # -------------------------- #
    #  Step 1: update edition
    # -------------------------- #
    edition = "v3"
    outputs = post_update_edition(edition,url = url,token=token)
    print('Output update edition: ', outputs)

    # -------------------------- #
    #  Step 2: update edition
    # -------------------------- #
    # 預設路徑 (二選一)
    diffusion_transformer_path = "/mnt/models/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-512x512"
    # diffusion_transformer_path = "/mnt/models/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-768x768"
    outputs = post_diffusion_transformer(diffusion_transformer_path, url = url, token=token)
    print('Output update edition: ', outputs)

其中：

<eas-service-url>：替換為步驟a中查詢到的服務訪問地址。
<eas-service-token>：替換為步驟a中查詢到的服務Token。

調用服務，產生視頻或圖片。

服務輸入參數說明

參數名	說明	類型	預設值
prompt_textbox	使用者輸入的正向提示詞。	string	必填。無預設值
negative_prompt_textbox	使用者輸入的負向提示詞。	string	"The video is not of a high quality, it has a low resolution, and the audio quality is not clear. Strange motion trajectory, a poor composition and deformed video, low resolution, duplicate and ugly, strange body structure, long and strange neck, bad teeth, bad eyes, bad limbs, bad hands, rotating camera, blurry camera, shaking camera. Deformation, low-resolution, blurry, ugly, distortion."
sample_step_slider	使用者輸入的步數。	int	30
cfg_scale_slider	引導係數。	int	6
sampler_dropdown	採樣器類型。取值包括：Eluer、EluerA、DPM++、PNDM、DDIM	string	Eluer
width_slider	產生視頻寬度。	int	672
height_slider	產生視頻高度。	int	384
length_slider	產生視訊框架數。	int	144
is_image	是否是圖片。	bool	FALSE
lora_alpha_slider	LoRA模型參數的權重。	float	0.55
seed_textbox	隨機種子。	int	43
lora_model_path	額外的LoRA模型路徑。若有，則會在請求時帶上lora。在當次請求後移除。	string	none
base_model_path	需要更新的transformer模型路徑。	string	none
motion_module_path	需要更新的motion_module模型路徑。	string	none
generation_method	組建類型。包括：Video Generation，Image Generation	string	none

Python請求樣本

服務返回base64_encoding，為base64結果。

您可以在/mnt/workspace/demos/easyanimate/目錄中查看產生結果。

import base64
import json
import sys
import time
from datetime import datetime
from io import BytesIO

import cv2
import requests
import base64

def post_infer(generation_method, length_slider, url='http://127.0.0.1:7860',token=None):
    head = {
        'Authorization': token
    }

    datas = json.dumps({
        "base_model_path": "none",
        "motion_module_path": "none",
        "lora_model_path": "none", 
        "lora_alpha_slider": 0.55, 
        "prompt_textbox": "This video shows Mount saint helens, washington - the stunning scenery of a rocky mountains during golden hours - wide shot. A soaring drone footage captures the majestic beauty of a coastal cliff, its red and yellow stratified rock faces rich in color and against the vibrant turquoise of the sea.", 
        "negative_prompt_textbox": "Strange motion trajectory, a poor composition and deformed video, worst quality, normal quality, low quality, low resolution, duplicate and ugly, strange body structure, long and strange neck, bad teeth, bad eyes, bad limbs, bad hands, rotating camera, blurry camera, shaking camera", 
        "sampler_dropdown": "Euler", 
        "sample_step_slider": 30, 
        "width_slider": 672, 
        "height_slider": 384, 
        "generation_method": "Video Generation",
        "length_slider": length_slider,
        "cfg_scale_slider": 6,
        "seed_textbox": 43,
    })
    r = requests.post(f'{url}/easyanimate/infer_forward', data=datas, headers=head, timeout=1500)
    data = r.content.decode('utf-8')
    return data


if __name__ == '__main__':
    # initiate time
    now_date    = datetime.now()
    time_start  = time.time()  
    
    url = '<eas-service-url>'
    token = '<eas-service-token>'

    # -------------------------- #
    #  Step 3: infer
    # -------------------------- #
    # "Video Generation" and "Image Generation"
    generation_method = "Video Generation"
    length_slider = 72
    outputs = post_infer(generation_method, length_slider, url = url, token=token)
    
    # Get decoded data
    outputs = json.loads(outputs)
    base64_encoding = outputs["base64_encoding"]
    decoded_data = base64.b64decode(base64_encoding)

    is_image = True if generation_method == "Image Generation" else False
    if is_image or length_slider == 1:
        file_path = "1.png"
    else:
        file_path = "1.mp4"
    with open(file_path, "wb") as file:
        file.write(decoded_data)
        
    # End of record time
    # The calculated time difference is the execution time of the program, expressed in seconds / s
    time_end = time.time()  
    time_sum = (time_end - time_start) % 60 
    print('# --------------------------------------------------------- #')
    print(f'#   Total expenditure: {time_sum}s')
    print('# --------------------------------------------------------- #')

其中：

<eas-service-url>：替換為步驟a中查詢到的服務訪問地址。
<eas-service-token>：替換為步驟a中查詢到的服務Token。

情境二：微調訓練後部署模型

進入快速開始頁面。
1. 登入PAI控制台。
2. 在左側導覽列單擊工作空間列表，在工作空間列表頁面單擊目標工作空間名稱，進入對應工作空間內。
3. 在左側導覽列單擊快速開始，進入快速開始頁面。
在快速開始頁面，搜尋EasyAnimate 高清長視頻產生，單擊卡片，進入模型詳情頁面。
單擊右上方的微調訓練，根據業務需求完成訓練輸出配置、超參數配置等。其中，超參數詳情請參見附錄：微調模型超參數說明。
EasyAnimate目前僅支援使用bf16進行推理，請選擇A10及其以上的顯卡。如果使用圖片進行LoRA訓練，需要最低GPU顯存為20 GB。如果您使用更大的batch_size、num_train_epochs使用視頻資料進行微調，訓練需要消耗更大的顯存。
單擊訓練，在彈出的計費提醒對話方塊中，單擊確定，頁面將自動跳轉到服務詳情頁面。
當任務狀態變為成功時，代表模型訓練成功。
單擊右上方的部署。
當狀態變為運行中時，代表模型部署成功。
在服務詳情頁面，單擊查看WEB應用。
在WebUI介面，選擇訓練完成的LoRA模型進行視頻產生。

附錄：微調模型超參數說明

參數名稱	類型	含義
learning_rate	float	學習率。
adam_weight_decay	float	Adam最佳化器的權重衰減值。
adam_epsilon	float	Adam最佳化器的epsilon值。
num_train_epochs	int	訓練總輪數。
checkpointing_steps	int	儲存模型的間隔步數。
train_batch_size	int	訓練採樣的批大小。
vae_mini_batch	int	訓練vae的切片大小。
image_sample_size	int	訓練圖片解析度。
video_sample_size	int	訓練視頻解析度。
video_sample_stride	int	訓練視頻採樣間隔。
video_sample_n_frames	int	訓練視頻採樣幀數。
rank	int	LoRA rank。
network_alpha	int	LoRA nework_alpha。
gradient_accumulation_steps	int	梯度累計步數。
dataloader_num_workers	int	資料載入的子進程數。

Platform For AI：視頻產生解決方案