All Products
Search
Document Center

Platform For AI:Generate a video

Last Updated:Dec 19, 2024

EasyAnimate is a self-developed diffusion transformer (DiT) video generation framework for Platform for AI (PAI). EasyAnimate provides a complete solution for generating high-definition long videos. The solution includes video data preprocessing, variational autoencoder (VAE) training, DiT training, model inference, and model evaluation. This topic describes how to integrate EasyAnimate with PAI and implement model inference, fine-tuning, and deployment with a few clicks.

Background information

You can use one of the following methods to generate videos:

  • Method 1: Use Data Science Workshop (DSW)

    DSW is an end-to-end Artificial Intelligence (AI) development platform tailored for algorithm developers. DSW integrates multiple cloud development environments, such as JupyterLab, WebIDE, and Terminal. DSW Gallery provides various cases and solutions to help you get familiar with the development process in an efficient manner. You can follow the tutorials in DSW Gallery to directly run the Notebook case or file and perform model training and inference by using EasyAnimate-based video generation models. You can also perform secondary development operations based on the models, such as fine-tuning the models.

  • Method 2: Use QuickStart

    QuickStart integrates high-quality pre-trained models from many open source AI communities. QuickStart also supports zero-code implementation throughout the overall process of training, deployment, and inference based on open source models. You can use QuickStart to deploy EasyAnimate models and generate videos with a few clicks. This provides you with a faster, more efficient, and more convenient AI application experience.

Billing

If you use DSW, you are charged for DSW and Elastic Algorithm Service (EAS) resources. If you use QuickStart, you are charged for Deep Learning Containers (DLC) and EAS resources.

For more information, see Billing of DSW, Billing of EAS, and Billing of QuickStart.

Prerequisites

Method 1: Use DSW

Step 1: Create a DSW instance

  1. Go to the Data Science Workshop (DSW) page.

    1. Log on to the PAI console.

    2. In the top navigation bar of the Overview page, select a region.

    3. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    4. In the left-side navigation pane of the Workspace Details page, choose Model Training > Data Science Workshop (DSW).

  2. On the page that appears, click Create Instance.

  3. On the Configure Instance page, configure the key parameters. The following table describes these parameters. For other parameters, use the default values.

    Parameter

    Description

    Instance Name

    Specify a name for the instance. In this example, AIGC_test_01 is used.

    Instance Type

    On the GPU tab, select the ecs.gn7i-c8g1.2xlarge instance type or an instance type that uses an A10 and GU100 GPU.

    Image

    Select the Alibaba Cloud image easyanimate:1.1.5-pytorch2.2.0-gpu-py310-cu118-ubuntu22.04.

    Dataset (optional)

    Click Add. In the Dataset panel, click Create Dataset to create an OSS or File Storage NAS (NAS) dataset.

  4. Click Yes.

Step 2: Download an EasyAnimate model

  1. Click Open in the Actions column of the DSW instance that you want to manage.

  2. On the Launcher page of the Notebook tab, click DSW Gallery in the Tool section of Quick Start.

  3. On the DSW Gallery page, search for AI video generation demo based on EasyAnimate and click Open in DSW. The required resources and tutorial file are automatically downloaded to the DSW instance. After the download is complete, the tutorial file automatically opens.

    image

  4. Download the EasyAnimate-related code and model and install the model.

    In the easyanimate.ipynb tutorial file, click the image icon to run the environment installation commands in the Set up the environment section, including Define functions, Download code, and Download models. After the commands in a step is successfully run, run the steps in sequence.

Step 3: Perform model inference

  1. Click the image icon to run the commands in the Perform model inference > Code Infer > UI Infer section to perform model inference.

  2. Click the generated link to go to the web UI.

    image

  3. On the web UI, select the path of the pre-trained model. Configure other parameters based on your business requirements.

    image

    Optional. If you want to use the feature of using images to generate a video, upload images in the Image to Video section.

    image

  4. Click Generate. Wait for a period of time. Then, you can view or download the generated video on the right side.

    image

Step 4: Perform LoRA fine-tuning

EasyAnimate provides various model training methods, including DiT model training and VAE model training. DiT model training includes Low-Rank Adaptation (LoRA) fine-tuning and full fine-tuning of a base model. For more information about LoRA fine-tuning built in DSW Gallery, see EasyAnimate.

Prepare data

Click the image icon to run the commands in the Perform model training > Prepare data section. Sample data is downloaded for model training. You can also prepare a data file by using the following format:

project/
├── datasets/
│   ├── internal_datasets/
│       ├── videos/
│       │   ├── 00000001.mp4
│       │   ├── 00000002.mp4
│       │   └── .....
│       └── json_of_internal_datasets.json

The following table describes the format and parameters of the JSON file.

[
    {
      "file_path": "videos/00000001.mp4",
      "text": "A group of young men in suits and sunglasses are walking down a city street.",
      "type": "video"
    },
    {
      "file_path": "videos/00000002.mp4",
      "text": "A notepad with a drawing of a woman on it.",
      "type": "video"
    }
    .....
]

Parameter

Description

file_path

The path in which the video or image data is stored. This is a relative path.

text

The text description of the data.

type

The file type. Valid values: video and image.

Start model training

  1. Optional. If you use a data file that you prepared for fine-tuning, set the DATASET_NAME and DATASET_META_NAME parameters in the corresponding training script to the directory of the training data and the path of the training file.

    export DATASET_NAME=""# The directory of the training data.
    export DATASET_META_NAME=datasets/Minimalism/metadata_add_width_height.json# The path of the training file.
  2. Click the image icon to run the commands in the Start model training > LoRA Fintune section.

  3. After the training is complete, click the image icon to run the commands in the Inference with LoRA model section to move the trained model to the EasyAnimate/models/Personalized_Model folder.

  4. Click the generated link to go to the web UI and select the trained LoRA model to generate a video.

    image

Method 2: Use QuickStart

QuickStart is a product component of PAI. QuickStart integrates high-quality pre-trained models from open source AI communities and supports zero-code implementation throughout the overall process of training, deployment, and inference based on open source models. You can directly deploy and use the models, or fine-tune the models based on your business requirements for model deployment.

Scenario 1: Directly deploy a model

  1. Go to the QuickStart page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane of the Workspace Details page, choose QuickStart > Model Gallery.

  2. On the Model Gallery page, search for EasyAnimate Long Video Generation Model, click Deploy, and then configure the parameters.

    EasyAnimate supports only bf16 for inference. You must select an instance type that uses an A10 or later GPU.

    image

  3. Click Deploy. In the Billing Notification message, click OK. The Service Details tab of the model details page appears.

    When the value of the Status parameter changes to Running, the model is deployed.

  4. After the model is deployed, use the web UI or the related API operations to call the model service to generate a video.

    Web UI

    1. On the Service details tab, click View Web App.

      image

    2. On the web UI, select the path of the pre-trained model. Configure other parameters based on your business requirements.

      image

    3. Click Generate. Wait for a period of time. Then, you can view or download the generated video on the right side.

      image

    API

    1. In the Resource Information section of the Service details tab, click View Call Information to obtain the information required to call the model service.

      image

    2. Update the transformer model by calling the API operations on a DSW instance or in your on-premises Python environment.

      If you select a model on the web UI, you do not need to send a request to call the model service. If the request times out, view EAS logs to check whether the model is loaded. If the Update diffusion transformer done message is displayed in the logs, the model is loaded.

      The following code shows a sample Python request:

      import json
      import requests
      
      
      def post_diffusion_transformer(diffusion_transformer_path, url='http://127.0.0.1:7860', token=None):
          datas = json.dumps({
              "diffusion_transformer_path": diffusion_transformer_path
          })
          head = {
              'Authorization': token
          }
          r = requests.post(f'{url}/easyanimate/update_diffusion_transformer', data=datas, headers=head, timeout=15000)
          data = r.content.decode('utf-8')
          return data
      
      def post_update_edition(edition, url='http://0.0.0.0:7860',token=None):
          head = {
              'Authorization': token
          }
      
          datas = json.dumps({
              "edition": edition
          })
          r = requests.post(f'{url}/easyanimate/update_edition', data=datas, headers=head)
          data = r.content.decode('utf-8')
          return data
        
      if __name__ == '__main__':
          url = '<eas-service-url>'
          token = '<eas-service-token>'
      
          # -------------------------- #
          #  Step 1: update edition
          # -------------------------- #
          edition = "v3"
          outputs = post_update_edition(edition,url = url,token=token)
          print('Output update edition: ', outputs)
      
          # -------------------------- #
          #  Step 2: update edition
          # -------------------------- #
          # The default path. You can select one of the following paths.
          diffusion_transformer_path = "/mnt/models/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-512x512"
          # diffusion_transformer_path = "/mnt/models/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-768x768"
          outputs = post_diffusion_transformer(diffusion_transformer_path, url = url, token=token)
          print('Output update edition: ', outputs)

      Take note of the following parameters:

      • <eas-service-url>: Replace eas-service-url with the endpoint that you queried in Step a.

      • <eas-service-token>: Replace eas-service-token with the service token that you queried in Step a.

    3. Call the model service to generate a video or an image.

    • The following table describes the parameters used for calling the model service.

      Parameter

      Description

      Type

      Default value

      prompt_textbox

      The positive prompt.

      string

      No default value

      negative_prompt_textbox

      The negative prompt.

      string

      "The video is not of a high quality, it has a low resolution, and the audio quality is not clear. Strange motion trajectory, a poor composition and deformed video, low resolution, duplicate and ugly, strange body structure, long and strange neck, bad teeth, bad eyes, bad limbs, bad hands, rotating camera, blurry camera, shaking camera. Deformation, low-resolution, blurry, ugly, distortion."

      sample_step_slider

      The number of steps.

      int

      30

      cfg_scale_slider

      The guidance coefficient.

      int

      6

      sampler_dropdown

      The sampler type. Valid values: Eluer, EluerA, DPM++, PNDM, and DDIM.

      string

      Eluer

      Eluer

      width_slider

      The width of the generated video.

      int

      672

      height_slider

      The height of the generated video.

      int

      384

      length_slider

      The number of frames of the generated video.

      int

      144

      is_image

      Specifies whether an image is generated.

      bool

      FALSE

      lora_alpha_slider

      The weight of the LoRA model parameters.

      float

      0.55

      seed_textbox

      The random seed.

      int

      43

      lora_model_path

      The additional path of the LoRA model.

      If an additional path exists, LoRA is included in a request and removed after the request is complete.

      string

      none

      base_model_path

      The path that you want to update for the transformer model.

      string

      none

      motion_module_path

      The path you need to update for the motion_module model.

      string

      none

      generation_method

      The generation type. Valid values: Video Generation and Image Generation.

      string

      none

    • Sample Python request

      If the model service returns base64_encoding, the model service is Base64-encoded.

      You can view the generation results in the /mnt/workspace/demos/easyanimate/ directory.

      import base64
      import json
      import sys
      import time
      from datetime import datetime
      from io import BytesIO
      
      import cv2
      import requests
      import base64
      
      def post_infer(generation_method, length_slider, url='http://127.0.0.1:7860',token=None):
          head = {
              'Authorization': token
          }
      
          datas = json.dumps({
              "base_model_path": "none",
              "motion_module_path": "none",
              "lora_model_path": "none", 
              "lora_alpha_slider": 0.55, 
              "prompt_textbox": "This video shows Mount saint helens, washington - the stunning scenery of a rocky mountains during golden hours - wide shot. A soaring drone footage captures the majestic beauty of a coastal cliff, its red and yellow stratified rock faces rich in color and against the vibrant turquoise of the sea.", 
              "negative_prompt_textbox": "Strange motion trajectory, a poor composition and deformed video, worst quality, normal quality, low quality, low resolution, duplicate and ugly, strange body structure, long and strange neck, bad teeth, bad eyes, bad limbs, bad hands, rotating camera, blurry camera, shaking camera", 
              "sampler_dropdown": "Euler", 
              "sample_step_slider": 30, 
              "width_slider": 672, 
              "height_slider": 384, 
              "generation_method": "Video Generation",
              "length_slider": length_slider,
              "cfg_scale_slider": 6,
              "seed_textbox": 43,
          })
          r = requests.post(f'{url}/easyanimate/infer_forward', data=datas, headers=head, timeout=1500)
          data = r.content.decode('utf-8')
          return data
      
      
      if __name__ == '__main__':
          # initiate time
          now_date    = datetime.now()
          time_start  = time.time()  
          
          url = '<eas-service-url>'
          token = '<eas-service-token>'
      
          # -------------------------- #
          #  Step 3: infer
          # -------------------------- #
          # "Video Generation" and "Image Generation"
          generation_method = "Video Generation"
          length_slider = 72
          outputs = post_infer(generation_method, length_slider, url = url, token=token)
          
          # Get decoded data
          outputs = json.loads(outputs)
          base64_encoding = outputs["base64_encoding"]
          decoded_data = base64.b64decode(base64_encoding)
      
          is_image = True if generation_method == "Image Generation" else False
          if is_image or length_slider == 1:
              file_path = "1.png"
          else:
              file_path = "1.mp4"
          with open(file_path, "wb") as file:
              file.write(decoded_data)
              
          # End of record time
          # The calculated time difference is the execution time of the program, expressed in seconds / s
          time_end = time.time()  
          time_sum = (time_end - time_start) % 60 
          print('# --------------------------------------------------------- #')
          print(f'#   Total expenditure: {time_sum}s')
          print('# --------------------------------------------------------- #')

      Take note of the following parameters:

      • <eas-service-url>: Replace eas-service-url with the endpoint that you queried in Step a.

      • <eas-service-token>: Replace eas-service-token with the service token that you queried in Step a.

Scenario 2: Deploy a model after you fine-tune the model

  1. Go to the QuickStart page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane of the Workspace Details page, choose QuickStart > Model Gallery.

  2. On the Model Gallery page, search for EasyAnimate Long Video Generation Model and click the model.

  3. In the upper-right corner of the model details page, click Fine-tune to perform the output data configurations and configure hyperparameters based on your business requirements. For more information about hyperparameters, see Appendix: Hyperparameters for fine-tuned models.

    EasyAnimate supports only bf16 for inference. You must select an instance type that uses an A10 or later GPU. If you use images for LoRA model training, the minimum GPU memory is 20 GB. The GPU memory consumed by the training depends on the value of the batch_size and num_train_epochs parameters. If you use video data for fine-tuning, you need to set the batch_size and num_train_epochs parameters to a larger value, and the training requires more GPU memory.

    image

  4. Click Fine-tune. In the Billing Notification message, click OK. The Task details tab of the model training details page appears.

    If the value of the Job Status parameter changes to Success, the model is trained.

  5. In the upper-right corner, click Deploy.

    If Job Status changes to Running, the model is deployed.

  6. On the Service details tab, click View Web App.

    image

  7. On the web UI, select the trained LoRA model to generate a video.

    image

Appendix: Hyperparameters for fine-tuned models

Parameter

Type

Description

learning_rate

float

The learning rate.

adam_weight_decay

float

The weight decay of the Adam optimizer.

adam_epsilon

float

The epsilon of the Adam optimizer.

num_train_epochs

int

The total number of training rounds.

checkpointing_steps

int

The interval at which the model is saved.

train_batch_size

int

The batch size of the training sample.

vae_mini_batch

int

The minimum batch size for VAE model training.

image_sample_size

int

The resolution of the image that you want to use to train the model.

video_sample_size

int

The resolution of the video that you want to use to train the model.

video_sample_stride

int

The sampling interval of the video that you want to use to train the model.

video_sample_n_frames

int

The number of sampling frames for the video that you want to use to train the model.

rank

int

The model complexity.

network_alpha

int

The network Alpha.

gradient_accumulation_steps

int

The number of gradient accumulation steps for model training.

dataloader_num_workers

int

The number of child workers for data loading.