Generate a video - Platform For AI - Alibaba Cloud Documentation Center

EasyAnimate is a self-developed diffusion transformer (DiT) video generation framework for Platform for AI (PAI). EasyAnimate provides a complete solution for generating high-definition long videos. The solution includes video data preprocessing, variational autoencoder (VAE) training, DiT training, model inference, and model evaluation. You can use this solution to perform model inference for text-to-video generation and image-to-video generation. This topic describes how to integrate EasyAnimate with PAI and implement model inference, fine-tuning, and deployment with a few clicks.

Background information

You can use one of the following methods to generate videos:

Method 1: Use Data Science Workshop (DSW)
DSW is an end-to-end Artificial Intelligence (AI) development platform tailored for algorithm developers. DSW integrates multiple cloud development environments, such as JupyterLab, WebIDE, and Terminal. DSW Gallery provides various cases and solutions to help you get familiar with the development process in an efficient manner. You can follow the tutorials in DSW Gallery to directly run the Notebook case or file and perform model training and inference by using EasyAnimate-based video generation models. You can also perform secondary development operations based on the models, such as fine-tuning the models.
Method 2: Use QuickStart
QuickStart integrates high-quality pre-trained models from many open source AI communities. QuickStart also supports zero-code implementation throughout the overall process of training, deployment, and inference based on open source models. You can use QuickStart to deploy EasyAnimate models and generate videos with a few clicks. This provides you with a faster, more efficient, and more convenient AI application experience.

Billing

If you use DSW, you are charged for DSW and Elastic Algorithm Service (EAS) resources. If you use QuickStart, you are charged for Deep Learning Containers (DLC) and EAS resources.

For more information, see Billing of DSW, Billing of EAS, and Billing of QuickStart.

Prerequisites

A workspace is created in PAI. For more information, see Activate PAI and create a default workspace.
Object Storage Service (OSS) is activated or File Storage NAS (NAS) is activated.

Method 1: Use DSW

Step 1: Create a DSW instance

Go to the Data Science Workshop (DSW) page.
1. Log on to the PAI console.
2. In the top navigation bar of the Overview page, select a region.
3. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
4. In the left-side navigation pane of the Workspace Details page, choose Model Training > Data Science Workshop (DSW).
On the page that appears, click Create Instance.

On the Create Instance page, configure the key parameters that are described in the following table. For other parameters, use the default values.

Parameter	Description
Instance Name	Specify a name for the instance. In this example, AIGC_test_01 is used.
Resource Type	Select Public Resources.
Instance Type	On the GPU tab, select the ecs.gn7i-c8g1.2xlarge instance type or an instance type that uses an A10 and GU100 GPU.
Image	Select the Alibaba Cloud image easyanimate:1.1.5-pytorch2.2.0-gpu-py310-cu118-ubuntu22.04.
Dataset Mounting (optional)	Click Custom Dataset. Then, select a created OSS or File Storage NAS (NAS) dataset. For more information about how to create a dataset, see Create and manage datasets.

Click Yes.

Step 2: Download an EasyAnimate model

Click Open in the Actions column of the DSW instance that you want to manage.
On the Launcher page of the Notebook tab, click DSW Gallery in the Tool section of Quick Start.
On the DSW Gallery page, search for AI video generation demo based on EasyAnimate and click Open in DSW. The required resources and tutorial file are automatically downloaded to the DSW instance. After the download is complete, the tutorial file automatically opens.
Download the EasyAnimate-related code and model and install the model.
In the easyanimate.ipynb tutorial file, click the icon to run the environment installation commands in the Set up the environment section, including Define functions, Download code, and Download models. After the commands in a step is successfully run, run the steps in sequence.

Step 3: Perform model inference

Click the icon to run the commands in the Perform model inference > Code Infer > UI Infer section to perform model inference.
Click the generated link to go to the web UI.
On the web UI, select the path of the pre-trained model. Configure other parameters based on your business requirements.
Optional. If you want to use the feature of using images to generate a video, upload images in the Image to Video section.
Click Generate. Wait for a period of time. Then, you can view or download the generated video on the right side.

Step 4: Perform LoRA fine-tuning

EasyAnimate provides various model training methods, including DiT model training and VAE model training. DiT model training includes Low-Rank Adaptation (LoRA) fine-tuning and full fine-tuning of a base model. For more information about LoRA fine-tuning built in DSW Gallery, see EasyAnimate.

Prepare data

Click the icon to run the commands in the Perform model training > Prepare data section. Sample data is downloaded for model training. You can also prepare a data file by using the following format:

project/
├── datasets/
│   ├── internal_datasets/
│       ├── videos/
│       │   ├── 00000001.mp4
│       │   ├── 00000002.mp4
│       │   └── .....
│       └── json_of_internal_datasets.json

The following table describes the format and parameters of the JSON file.

[
    {
      "file_path": "videos/00000001.mp4",
      "text": "A group of young men in suits and sunglasses are walking down a city street.",
      "type": "video"
    },
    {
      "file_path": "videos/00000002.mp4",
      "text": "A notepad with a drawing of a woman on it.",
      "type": "video"
    }
    .....
]

Parameter	Description
file_path	The path in which the video or image data is stored. This is a relative path.
text	The text description of the data.
type	The file type. Valid values: `video` and `image`.

Start model training

Optional. If you use a data file that you prepared for fine-tuning, set the DATASET_NAME and DATASET_META_NAME parameters in the corresponding training script to the directory of the training data and the path of the training file.
```
export DATASET_NAME=""# The directory of the training data.
export DATASET_META_NAME=datasets/Minimalism/metadata_add_width_height.json# The path of the training file.
```
Click the icon to run the commands in the Start model training > LoRA Fintune section.
After the training is complete, click the icon to run the commands in the Inference with LoRA model section to move the trained model to the EasyAnimate/models/Personalized_Model folder.
Click the generated link to go to the web UI and select the trained LoRA model to generate a video.

Method 2: Use QuickStart

QuickStart is a product component of PAI. QuickStart integrates high-quality pre-trained models from open source AI communities and supports zero-code implementation throughout the overall process of training, deployment, and inference based on open source models. You can directly deploy and use the models, or fine-tune the models based on your business requirements for model deployment.

Scenario 1: Directly deploy a model

Go to the QuickStart page.
1. Log on to the PAI console.
2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
3. In the left-side navigation pane of the Workspace Details page, choose QuickStart > Model Gallery.
On the Model Gallery page, search for EasyAnimate Long Video Generation Model, click Deploy, and then configure the parameters.
EasyAnimate supports only bf16 for inference. You must select an instance type that uses an A10 or later GPU.
Click Deploy. In the Billing Notification message, click OK. The Service Details tab of the model details page appears.
When the value of the Status parameter changes to Running, the model is deployed.

After the model is deployed, use the web UI or the related API operations to call the model service to generate a video.

Web UI

On the Service details tab, click View Web App.
On the web UI, select the path of the pre-trained model. Configure other parameters based on your business requirements.
Click Generate. Wait for a period of time. Then, you can view or download the generated video on the right side.

API

In the Resource Information section of the Service details tab, click View Call Information to obtain the information required to call the model service.

Update the transformer model by calling the API operations on a DSW instance or in your on-premises Python environment.

If you select a model on the web UI, you do not need to send a request to call the model service. If the request times out, view EAS logs to check whether the model is loaded. If the Update diffusion transformer done message is displayed in the logs, the model is loaded.

The following code shows a sample Python request:

import json
import requests


def post_diffusion_transformer(diffusion_transformer_path, url='http://127.0.0.1:7860', token=None):
    datas = json.dumps({
        "diffusion_transformer_path": diffusion_transformer_path
    })
    head = {
        'Authorization': token
    }
    r = requests.post(f'{url}/easyanimate/update_diffusion_transformer', data=datas, headers=head, timeout=15000)
    data = r.content.decode('utf-8')
    return data

def post_update_edition(edition, url='http://0.0.0.0:7860',token=None):
    head = {
        'Authorization': token
    }

    datas = json.dumps({
        "edition": edition
    })
    r = requests.post(f'{url}/easyanimate/update_edition', data=datas, headers=head)
    data = r.content.decode('utf-8')
    return data
  
if __name__ == '__main__':
    url = '<eas-service-url>'
    token = '<eas-service-token>'

    # -------------------------- #
    #  Step 1: update edition
    # -------------------------- #
    edition = "v3"
    outputs = post_update_edition(edition,url = url,token=token)
    print('Output update edition: ', outputs)

    # -------------------------- #
    #  Step 2: update edition
    # -------------------------- #
    # The default path. You can select one of the following paths.
    diffusion_transformer_path = "/mnt/models/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-512x512"
    # diffusion_transformer_path = "/mnt/models/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-768x768"
    outputs = post_diffusion_transformer(diffusion_transformer_path, url = url, token=token)
    print('Output update edition: ', outputs)

Take note of the following parameters:

<eas-service-url>: Replace eas-service-url with the endpoint that you queried in Step a.
<eas-service-token>: Replace eas-service-token with the service token that you queried in Step a.

Call the model service to generate a video or an image.

The following table describes the parameters used for calling the model service.

Parameter	Description	Type	Default value
prompt_textbox	The positive prompt.	string	No default value
negative_prompt_textbox	The negative prompt.	string	"The video is not of a high quality, it has a low resolution, and the audio quality is not clear. Strange motion trajectory, a poor composition and deformed video, low resolution, duplicate and ugly, strange body structure, long and strange neck, bad teeth, bad eyes, bad limbs, bad hands, rotating camera, blurry camera, shaking camera. Deformation, low-resolution, blurry, ugly, distortion."
sample_step_slider	The number of steps.	int	30
cfg_scale_slider	The guidance coefficient.	int	6
sampler_dropdown	The sampler type. Valid values: Eluer, EluerA, DPM++, PNDM, and DDIM.	string	Eluer
sampler_dropdown		string	Eluer
width_slider	The width of the generated video.	int	672
height_slider	The height of the generated video.	int	384
length_slider	The number of frames of the generated video.	int	144
is_image	Specifies whether an image is generated.	bool	FALSE
lora_alpha_slider	The weight of the LoRA model parameters.	float	0.55
seed_textbox	The random seed.	int	43
lora_model_path	The additional path of the LoRA model. If an additional path exists, LoRA is included in a request and removed after the request is complete.	string	none
base_model_path	The path that you want to update for the transformer model.	string	none
motion_module_path	The path you need to update for the motion_module model.	string	none
generation_method	The generation type. Valid values: Video Generation and Image Generation.	string	none

Sample Python request

If the model service returns base64_encoding, the model service is Base64-encoded.

You can view the generation results in the /mnt/workspace/demos/easyanimate/ directory.

import base64
import json
import sys
import time
from datetime import datetime
from io import BytesIO

import cv2
import requests
import base64

def post_infer(generation_method, length_slider, url='http://127.0.0.1:7860',token=None):
    head = {
        'Authorization': token
    }

    datas = json.dumps({
        "base_model_path": "none",
        "motion_module_path": "none",
        "lora_model_path": "none", 
        "lora_alpha_slider": 0.55, 
        "prompt_textbox": "This video shows Mount saint helens, washington - the stunning scenery of a rocky mountains during golden hours - wide shot. A soaring drone footage captures the majestic beauty of a coastal cliff, its red and yellow stratified rock faces rich in color and against the vibrant turquoise of the sea.", 
        "negative_prompt_textbox": "Strange motion trajectory, a poor composition and deformed video, worst quality, normal quality, low quality, low resolution, duplicate and ugly, strange body structure, long and strange neck, bad teeth, bad eyes, bad limbs, bad hands, rotating camera, blurry camera, shaking camera", 
        "sampler_dropdown": "Euler", 
        "sample_step_slider": 30, 
        "width_slider": 672, 
        "height_slider": 384, 
        "generation_method": "Video Generation",
        "length_slider": length_slider,
        "cfg_scale_slider": 6,
        "seed_textbox": 43,
    })
    r = requests.post(f'{url}/easyanimate/infer_forward', data=datas, headers=head, timeout=1500)
    data = r.content.decode('utf-8')
    return data


if __name__ == '__main__':
    # initiate time
    now_date    = datetime.now()
    time_start  = time.time()  
    
    url = '<eas-service-url>'
    token = '<eas-service-token>'

    # -------------------------- #
    #  Step 3: infer
    # -------------------------- #
    # "Video Generation" and "Image Generation"
    generation_method = "Video Generation"
    length_slider = 72
    outputs = post_infer(generation_method, length_slider, url = url, token=token)
    
    # Get decoded data
    outputs = json.loads(outputs)
    base64_encoding = outputs["base64_encoding"]
    decoded_data = base64.b64decode(base64_encoding)

    is_image = True if generation_method == "Image Generation" else False
    if is_image or length_slider == 1:
        file_path = "1.png"
    else:
        file_path = "1.mp4"
    with open(file_path, "wb") as file:
        file.write(decoded_data)
        
    # End of record time
    # The calculated time difference is the execution time of the program, expressed in seconds / s
    time_end = time.time()  
    time_sum = (time_end - time_start) % 60 
    print('# --------------------------------------------------------- #')
    print(f'#   Total expenditure: {time_sum}s')
    print('# --------------------------------------------------------- #')

Take note of the following parameters:

<eas-service-url>: Replace eas-service-url with the endpoint that you queried in Step a.
<eas-service-token>: Replace eas-service-token with the service token that you queried in Step a.

Scenario 2: Deploy a model after you fine-tune the model

Go to the QuickStart page.
1. Log on to the PAI console.
2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
3. In the left-side navigation pane of the Workspace Details page, choose QuickStart > Model Gallery.
On the Model Gallery page, search for EasyAnimate Long Video Generation Model and click the model.
In the upper-right corner of the model details page, click Fine-tune to perform the output data configurations and configure hyperparameters based on your business requirements. For more information about hyperparameters, see Appendix: Hyperparameters for fine-tuned models.
EasyAnimate supports only bf16 for inference. You must select an instance type that uses an A10 or later GPU. If you use images for LoRA model training, the minimum GPU memory is 20 GB. The GPU memory consumed by the training depends on the value of the batch_size and num_train_epochs parameters. If you use video data for fine-tuning, you need to set the batch_size and num_train_epochs parameters to a larger value, and the training requires more GPU memory.
Click Fine-tune. In the Billing Notification message, click OK. The Task details tab of the model training details page appears.
If the value of the Job Status parameter changes to Success, the model is trained.
In the upper-right corner, click Deploy.
If Job Status changes to Running, the model is deployed.
On the Service details tab, click View Web App.
On the web UI, select the trained LoRA model to generate a video.

Appendix: Hyperparameters for fine-tuned models

Parameter	Type	Description
learning_rate	float	The learning rate.
adam_weight_decay	float	The weight decay of the Adam optimizer.
adam_epsilon	float	The epsilon of the Adam optimizer.
num_train_epochs	int	The total number of training rounds.
checkpointing_steps	int	The interval at which the model is saved.
train_batch_size	int	The batch size of the training sample.
vae_mini_batch	int	The minimum batch size for VAE model training.
image_sample_size	int	The resolution of the image that you want to use to train the model.
video_sample_size	int	The resolution of the video that you want to use to train the model.
video_sample_stride	int	The sampling interval of the video that you want to use to train the model.
video_sample_n_frames	int	The number of sampling frames for the video that you want to use to train the model.
rank	int	The model complexity.
network_alpha	int	The network Alpha.
gradient_accumulation_steps	int	The number of gradient accumulation steps for model training.
dataloader_num_workers	int	The number of child workers for data loading.