All Products
Search
Document Center

Platform For AI:GPU sharing

Last Updated:Jul 26, 2024

If you use dedicated resource groups to deploy services in Elastic Algorithm Service (EAS) of Platform for AI (PAI), you can enable GPU sharing to increase resource utilization. If you enable GPU sharing when you deploy a service, the system deploys virtualized GPU resources for the service. This allows EAS to allocate the resources required by each instance based on the computing power ratio and GPU memory that you specified. This topic describes how to configure GPU sharing.

Prerequisites

A dedicated resource group is created and resources are purchased. For more information, see Work with dedicated resource groups.

Limits

  • The GPU sharing feature is available only for users in the whitelist. If you want to use the GPU sharing feature, submit a ticket.

  • The GPU sharing feature is available only for services deployed by using dedicated resource groups.

Configure GPU sharing when you create a service

Create a service in the console

  1. Go to the Elastic Algorithm Service (EAS) page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose Model Deployment > Elastic Algorithm Service (EAS). The Elastic Algorithm Service (EAS) page appears.

  2. Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

  3. In the Resource Deployment Information section, configure the following key parameters. For more information about other parameters, see Deploy a model service in the PAI console. image

    Parameter

    Description

    Resource Group Type

    Select an existing dedicated resource group.

    GPU Sharing

    Select GPU Sharing.

    Deployment

    Configure the following parameters:

    • Single-GPU Memory (GB): the GPU memory required by each instance. The value is an integer. Unit: GB. PAI allows memory resources of one GPU to be allocated to multiple instances.

      Important

      The GPU memory of multiple instances is not strictly isolated. To prevent out-of-memory (OOM) errors, make sure that the GPU memory used by each instance does not exceed the requested amount.

    • Computing Power per GPU (%): the computing power of a single GPU required by each instance. The value must be an integer from 1 to 100. For example, if you enter 10, the system allocates 10% computing power of a single GPU to an instance. This facilitates flexible scheduling of computing power and allows multiple instances to share a single GPU.

  4. After you configure the parameters, click Deploy.

Create a service by using an on-premises client

  1. Download the EASCMD client and complete identity authentication. In this example, Windows 64 is used.

  2. Create a service configuration file named service.json in the directory in which the client is located. Sample content of the configuration file:

    {
        "containers": [
            {
                "image": "eas-registry-vpc.cn-beijing.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.4",
                "port": 8000,
                "script": "python webui/webui_server.py --port=8000 --model-path=Qwen/Qwen1.5-7B-Chat"
            }
        ],
        "metadata": {
            "cpu": 8,
            "enable_webservice": true,
            "gpu_core_percentage": 5,
            "gpu_memory": 20,
            "instance": 1,
            "memory": 20000,
            "name": "testchatglm",
            "resource": "eas-r-fky7kxiq4l2zzt****",
            "resource_burstable": false
        },
        "name": "test"
    }

    Take note of the following parameters. For information about other parameters, see All Parameters of model services.

    Parameter

    Description

    gpu_memory

    The amount of GPU memory required by each instance. The value must be an integer. Unit: GB.

    PAI allows memory resources of one GPU to be allocated to multiple instances. If you want to schedule GPU memory, set the gpu field to 0. If you set the gpu field to 1, the instance occupies the entire GPU. In this case, the gpu_memory field is ignored.

    Important

    The GPU memory of multiple instances is not strictly isolated. To prevent out-of-memory (OOM) errors, make sure that the GPU memory used by each instance does not exceed the requested amount.

    gpu_core_percentage

    The ratio of the computing power required per GPU by each instance. The value is an integer between 1 and 100. Unit: percentage. For example, if you set the parameter to 10, the system uses 10% computing power of each GPU.

    This facilitates flexible scheduling of computing power and allows multiple instances to share a single GPU. If you configure this parameter, you must also configure the gpu_memory parameter. Otherwise, this parameter does not take effect.

    resource

    The ID of the existing dedicated resource group. For more information about how to view the ID of a dedicated resource group, see Manage dedicated resource groups.

  3. Run the following command in the directory in which the JSON file is located to create the service: For more information, see Run commands to use the EASCMD client.

    eascmdwin64.exe create <service.json>

    Replace <service.json> with the name of the JSON file that you created.

Configure GPU sharing when you update a service

If you did not enable the GPU sharing feature when you deploy a service by using dedicated resource groups, you can enable GPU sharing by updating the service configuration.

Update the service in the console

  1. On the Elastic Algorithm Service (EAS) page, find the service that you want to update and click Update Service in the Actions column.

  2. In the Resource Configuration section of the update service page, configure the Resource Group Type, GPU Sharing, and Deployment parameters. For more information, see the "Create a service in the console" section of this topic.

  3. After you configure the parameters, click Deploy.

Update the service by using the on-premises client

  1. Download the EASCMD client and complete identity authentication. In this example, Windows 64 is used.

  2. Create a file named instances.json in the directory in which the client is located. Sample content of the file:

    "metadata": {
            "gpu_memory": 2,
            "gpu_core_percentage": 5
        }

    For more information about the parameters in the preceding code, see the "Create a service by using an on-premises client" section of this topic.

  3. Open the terminal tool. In the directory in which the JSON file is located, run the following command to enable GPU sharing for the EAS service:

    eascmdwin64.exe modify <service_name> -s <instances.json>

    Replace <service_name> with the name of the EAS service and <instances.json> with the name of the JSON file that you create.