All Products
Search
Document Center

Platform For AI:Use a preemptible job

Last Updated:Sep 24, 2024

If you have a shortage of computing power, you can use the preemptible job feature of Platform for AI (PAI), which allocates computing resources through a bidding system. Preemptible resources usually offer a price advantage over public pay-as-you-go resources. This enables cost-effective access to AI computing power and reduces the overall expense of jobs. This topic describes how to use preemptible resouces when creating a job in Deep Learning Containers (DLC) with Lingjun AI Computing Service resources.

Limits

  • Only users in a whitelist can use preemptible jobs. Contact your account manager before using the feature.

  • The preemptible job feature is only available in the China (Ulanqab) and Singapore regions.

  • The preemptible job feature only supports Lingjun AI Computing Service resources.

  • Preemptible jobs are subject to the following constraints:

    • Cannot be converted into subscription instances.

    • Instance and bandwidth specifications cannot be modified.

    • Does not support ICP Filing services.

    • No discounts for major customers.

Usage notes

  • The price of preemptible resources fluctuate with current supply and demand, and can offer up to a 90% reduction in instance costs compared to public pay-as-you-go instances.

  • Because preemptible resources can be preempted by all users of Alibaba Cloud, the availability of preemptible resources is not guaranteed. If you need to run a DLC job with preemptible resources, pay attention to the following points:

    • Resource request: After you submit a DLC job with preemptible resources, the system begins to preempt resources. If the resource inventory is insufficient, the job enters a pending state until resources are available.

    • Resource revocation: Preemptible resources may be revoked based on market price, inventory, and the maximum bid price and duration of the instance. Even when your DLC job is running, if the maxium bid price falls below the average market price or inventory is insufficient, resources may be revoked without notice, resulting in job failure. To improve job stability, you can:

Billing

Price description:

To use preemptible resources, you need to set a maximum bid price (preemptibleWithPriceLimit). The market price for preemptible resources fluctuates with supply and demand, and multiple jobs using the same resources may incur identical costs for a given period. The following table describes resource specifications and price ranges for preemptible resources:

Important

The market price for preemptible resources fluctuates with supply and demand in real time. The maxium bid price is ranges from 10% to 90% of the market price with a 10% interval. The actual market price and maxium bid price are displayed in the console.

Resource specification

Market price range (USD/hour)

Maxium bid price range (USD/hour)

Region

ml.gu7ef.8xlarge-gu100

5.700~57.000

5.7000~51.300

China (Ulanqab)

ml.gu7xf.8xlarge-gu108

5.040~50.400

5.040~45.360

ml.gu8xf.8xlarge-gu108

12.240~122.400

12.240~110.160

ml.gu8ef.8xlarge-gu100

23.220~232.200

23.220~208.980

Singapore

View billing details:

You can go to the Expense and Costs page the following day after job execution to review the costs incurred by preemptible resources. Similar to pay-as-you-go resources in DLC, the billing details of preemptible resource orders are displayed on the page. For more information, see View billing details.

Scenarios

  • Applicable scenarios:

    We recommend that you use preemptible resources to reduce costs in the following scenarios:

    • Jobs with short runtimes.

    • Jobs during debugging.

    • Jobs that allow interruptions.

    • Jobs that support resumption from interruptions, such as jobs using the EasyCkpt framework for PyTorch model training, which supports frequent checkpoint saving and resuming training after interruptions. For more information, see Use EasyCkpt to save and resume foundation model trainings.

  • Inapplicable scenarios:

    Do not use preemptible resources for services that requires high stability.

Procedure

To use preemptible resources for DLC jobs with Lingjun AI Computing Service, follow the following steps:

  1. Go to the Create Job page. For more infomation, see Step 1: Go to the Create Job page.

  2. Configure the following key parameters. For more information, see Submit training jobs.

    Parameter

    Description

    Resource Information

    Resource Type

    Select Lingjun AI Computing Service.

    Source

    Select Preemptible Resources.

    Job Resource

    In the Resource Type column, click image to select an instance type and set the Maximum Bid Price. The maxium bid price is ranges from 10% to 90% of the market price with a 10% interval. You can get the preemptible resources if your bid meets or exceeds the market price and inventory is available.

    VPC

    VPC(ID)

    From the dropdown list, select your virtual private cloud (VPC), vSwitch, and security group.

    Security Group

    vSwitch

    Fault Tolerance and Diagnosis

    Automatic Fault Tolerance

    We recommend that you enable Automatic Fault Tolerance, which allows preemptible jobs to re-enter the bidding queue after resource revocation. The job can resume when the average market price falls below your maximum bid price. For more information, see AIMaster: Elastic fault tolerance engine.

    image

  3. After you configure the parameters, click Confirm to submit the job.

    Then, DLC starts to request preemptible resources. If no resources is available, the job enters a pending state.