EasyAnimate is a self-developed diffusion transformer (DiT) video generation framework for Platform for AI (PAI). EasyAnimate provides a complete solution for generating high-definition long videos. The solution includes video data preprocessing, variational autoencoder (VAE) training, DiT training, model inference, and model evaluation. This topic describes how to integrate EasyAnimate with PAI and implement model inference, fine-tuning, and deployment with a few clicks.
Background information
You can use one of the following methods to generate videos:
Method 1: Use Data Science Workshop (DSW)
DSW is an end-to-end Artificial Intelligence (AI) development platform tailored for algorithm developers. DSW integrates multiple cloud development environments, such as JupyterLab, WebIDE, and Terminal. DSW Gallery provides various cases and solutions to help you get familiar with the development process in an efficient manner. You can follow the tutorials in DSW Gallery to directly run the Notebook case or file and perform model training and inference by using EasyAnimate-based video generation models. You can also perform secondary development operations based on the models, such as fine-tuning the models.
QuickStart integrates high-quality pre-trained models from many open source AI communities. QuickStart also supports zero-code implementation throughout the overall process of training, deployment, and inference based on open source models. You can use QuickStart to deploy EasyAnimate models and generate videos with a few clicks. This provides you with a faster, more efficient, and more convenient AI application experience.
Billing
If you use DSW, you are charged for DSW and Elastic Algorithm Service (EAS) resources. If you use QuickStart, you are charged for Deep Learning Containers (DLC) and EAS resources.
For more information, see Billing of DSW, Billing of EAS, and Billing of QuickStart.
Prerequisites
A workspace is created in PAI. For more information, see Activate PAI and create a default workspace.
Object Storage Service (OSS) is activated or File Storage NAS (NAS) is activated.
Method 1: Use DSW
Step 1: Create a DSW instance
Go to the Data Science Workshop (DSW) page.
Log on to the PAI console.
In the top navigation bar of the Overview page, select a region.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
In the left-side navigation pane of the Workspace Details page, choose Model Training > Data Science Workshop (DSW).
On the page that appears, click Create Instance.
On the Configure Instance page, configure the key parameters. The following table describes these parameters. For other parameters, use the default values.
Parameter
Description
Instance Name
Specify a name for the instance. In this example, AIGC_test_01 is used.
Instance Type
On the GPU tab, select the ecs.gn7i-c8g1.2xlarge instance type or an instance type that uses an A10 and GU100 GPU.
Image
Select the Alibaba Cloud image easyanimate:1.1.5-pytorch2.2.0-gpu-py310-cu118-ubuntu22.04.
Dataset (optional)
Click Add. In the Dataset panel, click Create Dataset to create an OSS or File Storage NAS (NAS) dataset.
Click Yes.
Step 2: Download an EasyAnimate model
Click Open in the Actions column of the DSW instance that you want to manage.
On the Launcher page of the Notebook tab, click DSW Gallery in the Tool section of Quick Start.
On the DSW Gallery page, search for AI video generation demo based on EasyAnimate and click Open in DSW. The required resources and tutorial file are automatically downloaded to the DSW instance. After the download is complete, the tutorial file automatically opens.
Download the EasyAnimate-related code and model and install the model.
In the easyanimate.ipynb tutorial file, click the icon to run the environment installation commands in the Set up the environment section, including Define functions, Download code, and Download models. After the commands in a step is successfully run, run the steps in sequence.
Step 3: Perform model inference
Click the icon to run the commands in the Perform model inference > Code Infer > UI Infer section to perform model inference.
Click the generated link to go to the web UI.
On the web UI, select the path of the pre-trained model. Configure other parameters based on your business requirements.
Optional. If you want to use the feature of using images to generate a video, upload images in the Image to Video section.
Click Generate. Wait for a period of time. Then, you can view or download the generated video on the right side.
Step 4: Perform LoRA fine-tuning
EasyAnimate provides various model training methods, including DiT model training and VAE model training. DiT model training includes Low-Rank Adaptation (LoRA) fine-tuning and full fine-tuning of a base model. For more information about LoRA fine-tuning built in DSW Gallery, see EasyAnimate.
Prepare data
Click the icon to run the commands in the Perform model training > Prepare data section. Sample data is downloaded for model training. You can also prepare a data file by using the following format:
project/
├── datasets/
│ ├── internal_datasets/
│ ├── videos/
│ │ ├── 00000001.mp4
│ │ ├── 00000002.mp4
│ │ └── .....
│ └── json_of_internal_datasets.json
The following table describes the format and parameters of the JSON file.
[
{
"file_path": "videos/00000001.mp4",
"text": "A group of young men in suits and sunglasses are walking down a city street.",
"type": "video"
},
{
"file_path": "videos/00000002.mp4",
"text": "A notepad with a drawing of a woman on it.",
"type": "video"
}
.....
]
Parameter | Description |
file_path | The path in which the video or image data is stored. This is a relative path. |
text | The text description of the data. |
type | The file type. Valid values: |
Start model training
Optional. If you use a data file that you prepared for fine-tuning, set the
DATASET_NAME
andDATASET_META_NAME
parameters in the corresponding training script to the directory of the training data and the path of the training file.export DATASET_NAME=""# The directory of the training data. export DATASET_META_NAME=datasets/Minimalism/metadata_add_width_height.json# The path of the training file.
Click the icon to run the commands in the Start model training > LoRA Fintune section.
After the training is complete, click the icon to run the commands in the Inference with LoRA model section to move the trained model to the EasyAnimate/models/Personalized_Model folder.
Click the generated link to go to the web UI and select the trained LoRA model to generate a video.
Method 2: Use QuickStart
QuickStart is a product component of PAI. QuickStart integrates high-quality pre-trained models from open source AI communities and supports zero-code implementation throughout the overall process of training, deployment, and inference based on open source models. You can directly deploy and use the models, or fine-tune the models based on your business requirements for model deployment.
Scenario 1: Directly deploy a model
Go to the QuickStart page.
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
In the left-side navigation pane of the Workspace Details page, choose QuickStart > Model Gallery.
On the Model Gallery page, search for EasyAnimate Long Video Generation Model, click Deploy, and then configure the parameters.
EasyAnimate supports only bf16 for inference. You must select an instance type that uses an A10 or later GPU.
Click Deploy. In the Billing Notification message, click OK. The Service Details tab of the model details page appears.
When the value of the Status parameter changes to Running, the model is deployed.
After the model is deployed, use the web UI or the related API operations to call the model service to generate a video.
Web UI
On the Service details tab, click View Web App.
On the web UI, select the path of the pre-trained model. Configure other parameters based on your business requirements.
Click Generate. Wait for a period of time. Then, you can view or download the generated video on the right side.
API
In the Resource Information section of the Service details tab, click View Call Information to obtain the information required to call the model service.
Update the transformer model by calling the API operations on a DSW instance or in your on-premises Python environment.
If you select a model on the web UI, you do not need to send a request to call the model service. If the request times out, view EAS logs to check whether the model is loaded. If the
Update diffusion transformer done
message is displayed in the logs, the model is loaded.The following code shows a sample Python request:
import json import requests def post_diffusion_transformer(diffusion_transformer_path, url='http://127.0.0.1:7860', token=None): datas = json.dumps({ "diffusion_transformer_path": diffusion_transformer_path }) head = { 'Authorization': token } r = requests.post(f'{url}/easyanimate/update_diffusion_transformer', data=datas, headers=head, timeout=15000) data = r.content.decode('utf-8') return data def post_update_edition(edition, url='http://0.0.0.0:7860',token=None): head = { 'Authorization': token } datas = json.dumps({ "edition": edition }) r = requests.post(f'{url}/easyanimate/update_edition', data=datas, headers=head) data = r.content.decode('utf-8') return data if __name__ == '__main__': url = '<eas-service-url>' token = '<eas-service-token>' # -------------------------- # # Step 1: update edition # -------------------------- # edition = "v3" outputs = post_update_edition(edition,url = url,token=token) print('Output update edition: ', outputs) # -------------------------- # # Step 2: update edition # -------------------------- # # The default path. You can select one of the following paths. diffusion_transformer_path = "/mnt/models/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-512x512" # diffusion_transformer_path = "/mnt/models/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-768x768" outputs = post_diffusion_transformer(diffusion_transformer_path, url = url, token=token) print('Output update edition: ', outputs)
Take note of the following parameters:
Call the model service to generate a video or an image.
The following table describes the parameters used for calling the model service.
Parameter
Description
Type
Default value
prompt_textbox
The positive prompt.
string
No default value
negative_prompt_textbox
The negative prompt.
string
"The video is not of a high quality, it has a low resolution, and the audio quality is not clear. Strange motion trajectory, a poor composition and deformed video, low resolution, duplicate and ugly, strange body structure, long and strange neck, bad teeth, bad eyes, bad limbs, bad hands, rotating camera, blurry camera, shaking camera. Deformation, low-resolution, blurry, ugly, distortion."
sample_step_slider
The number of steps.
int
30
cfg_scale_slider
The guidance coefficient.
int
6
sampler_dropdown
The sampler type. Valid values: Eluer, EluerA, DPM++, PNDM, and DDIM.
string
Eluer
Eluer
width_slider
The width of the generated video.
int
672
height_slider
The height of the generated video.
int
384
length_slider
The number of frames of the generated video.
int
144
is_image
Specifies whether an image is generated.
bool
FALSE
lora_alpha_slider
The weight of the LoRA model parameters.
float
0.55
seed_textbox
The random seed.
int
43
lora_model_path
The additional path of the LoRA model.
If an additional path exists, LoRA is included in a request and removed after the request is complete.
string
none
base_model_path
The path that you want to update for the transformer model.
string
none
motion_module_path
The path you need to update for the motion_module model.
string
none
generation_method
The generation type. Valid values: Video Generation and Image Generation.
string
none
Sample Python request
If the model service returns base64_encoding, the model service is Base64-encoded.
You can view the generation results in the /mnt/workspace/demos/easyanimate/ directory.
import base64 import json import sys import time from datetime import datetime from io import BytesIO import cv2 import requests import base64 def post_infer(generation_method, length_slider, url='http://127.0.0.1:7860',token=None): head = { 'Authorization': token } datas = json.dumps({ "base_model_path": "none", "motion_module_path": "none", "lora_model_path": "none", "lora_alpha_slider": 0.55, "prompt_textbox": "This video shows Mount saint helens, washington - the stunning scenery of a rocky mountains during golden hours - wide shot. A soaring drone footage captures the majestic beauty of a coastal cliff, its red and yellow stratified rock faces rich in color and against the vibrant turquoise of the sea.", "negative_prompt_textbox": "Strange motion trajectory, a poor composition and deformed video, worst quality, normal quality, low quality, low resolution, duplicate and ugly, strange body structure, long and strange neck, bad teeth, bad eyes, bad limbs, bad hands, rotating camera, blurry camera, shaking camera", "sampler_dropdown": "Euler", "sample_step_slider": 30, "width_slider": 672, "height_slider": 384, "generation_method": "Video Generation", "length_slider": length_slider, "cfg_scale_slider": 6, "seed_textbox": 43, }) r = requests.post(f'{url}/easyanimate/infer_forward', data=datas, headers=head, timeout=1500) data = r.content.decode('utf-8') return data if __name__ == '__main__': # initiate time now_date = datetime.now() time_start = time.time() url = '<eas-service-url>' token = '<eas-service-token>' # -------------------------- # # Step 3: infer # -------------------------- # # "Video Generation" and "Image Generation" generation_method = "Video Generation" length_slider = 72 outputs = post_infer(generation_method, length_slider, url = url, token=token) # Get decoded data outputs = json.loads(outputs) base64_encoding = outputs["base64_encoding"] decoded_data = base64.b64decode(base64_encoding) is_image = True if generation_method == "Image Generation" else False if is_image or length_slider == 1: file_path = "1.png" else: file_path = "1.mp4" with open(file_path, "wb") as file: file.write(decoded_data) # End of record time # The calculated time difference is the execution time of the program, expressed in seconds / s time_end = time.time() time_sum = (time_end - time_start) % 60 print('# --------------------------------------------------------- #') print(f'# Total expenditure: {time_sum}s') print('# --------------------------------------------------------- #')
Take note of the following parameters:
Scenario 2: Deploy a model after you fine-tune the model
Go to the QuickStart page.
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
In the left-side navigation pane of the Workspace Details page, choose QuickStart > Model Gallery.
On the Model Gallery page, search for EasyAnimate Long Video Generation Model and click the model.
In the upper-right corner of the model details page, click Fine-tune to perform the output data configurations and configure hyperparameters based on your business requirements. For more information about hyperparameters, see Appendix: Hyperparameters for fine-tuned models.
EasyAnimate supports only bf16 for inference. You must select an instance type that uses an A10 or later GPU. If you use images for LoRA model training, the minimum GPU memory is 20 GB. The GPU memory consumed by the training depends on the value of the batch_size and num_train_epochs parameters. If you use video data for fine-tuning, you need to set the batch_size and num_train_epochs parameters to a larger value, and the training requires more GPU memory.
Click Fine-tune. In the Billing Notification message, click OK. The Task details tab of the model training details page appears.
If the value of the Job Status parameter changes to Success, the model is trained.
In the upper-right corner, click Deploy.
If Job Status changes to Running, the model is deployed.
On the Service details tab, click View Web App.
On the web UI, select the trained LoRA model to generate a video.
Appendix: Hyperparameters for fine-tuned models
Parameter | Type | Description |
learning_rate | float | The learning rate. |
adam_weight_decay | float | The weight decay of the Adam optimizer. |
adam_epsilon | float | The epsilon of the Adam optimizer. |
num_train_epochs | int | The total number of training rounds. |
checkpointing_steps | int | The interval at which the model is saved. |
train_batch_size | int | The batch size of the training sample. |
vae_mini_batch | int | The minimum batch size for VAE model training. |
image_sample_size | int | The resolution of the image that you want to use to train the model. |
video_sample_size | int | The resolution of the video that you want to use to train the model. |
video_sample_stride | int | The sampling interval of the video that you want to use to train the model. |
video_sample_n_frames | int | The number of sampling frames for the video that you want to use to train the model. |
rank | int | The model complexity. |
network_alpha | int | The network Alpha. |
gradient_accumulation_steps | int | The number of gradient accumulation steps for model training. |
dataloader_num_workers | int | The number of child workers for data loading. |