Billing for model services

Updated at: 2025-04-03 08:21

Billing overview

You are not charged for activating Alibaba Cloud Model Studio. But you are charged for model inference when using large language models (LLMs) for text generation.

Billable items

Model inference (calling)

Method

Pay-as-you-go

Formula

Fee = Consumption × Unit price

Description

Free quota: View remaining quota

Unit price: View prices

Model inference (calling)

Overview & free quota

For a complete list of prices and free quotas, see Models. For detailed performance information, see Throttling.

You can view the number of calls and token consumption for a specific model in Model Studio console - Statistics.

Free quota

For information about how to obtain free quota and check remaining free quota, see Free quota for new users.

Flagship models

For prices and free quotas of other models, see Models.

Flagship models

通义new Qwen-Max

Best inference performance

通义new Qwen-Plus

Balanced performance, speed and cost

通义new Qwen-Turbo

Fast speed and low cost

Maximum context

(Tokens)

32,768

131,072

1,008,192

Minimum input price

(1,000 tokens)

$0.0016

$0.0004

$0.00005

Minimum output price

(1,000 tokens)

$0.0064

$0.0012

$0.0002

Batch discount

The text generation models qwen-max, qwen-plus, and qwen-turbo support batch calling. The cost for batch calling is 50% of the price for real-time calling. However, batch requests are not eligible for other types of discounts, such as free quota or context cache.

You can use API to submit batch tasks as files for asynchronous execution. The service processes large-scale data offline during non-peak hours and delivers results upon task completion or when the maximum wait time is reached.

Context cache

Enabling context cache does not require additional payment. If the system determines that your request hits the cache, the hit tokens will be charged as cached_token. The tokens that are not hit will be charged as input_token. The unit price of cached_token is 40% of the unit price of input_token.

output_token is charged at the original price.

image.png

The cached_tokens property of the return result indicates the number of tokens that hit the cache.

If you use the OpenAI compatible - Batch mode, the discount of context cache is not available.

For more information, see Context Cache.

FAQ

Billing rules
Cost management
About bills
About API

How to calculate token count?

Tokens are the basic units used by models to represent natural language text, which can be understood as "characters" or "words".

  • For Chinese text, 1 token usually corresponds to 1 Chinese character or word. For example, "你好,我是通义千问" will be converted to ['你好', ',', '我是', '通', '义', '千', '问'].

  • For English text, 1 token usually corresponds to 3-4 letters or 1 word. For example, "Nice to meet you." will be converted to ['Nice', ' to', ' meet', ' you', '.'].

Different models may have different tokenization methods. You can use the SDK to view the tokenization data of the Qwen model locally.

Python
Java
# Before running: pip install dashscope
from dashscope import get_tokenizer

# Get the tokenizer object, currently only supports the Qwen series models
tokenizer = get_tokenizer('qwen-turbo')

input_str = 'Qwen has powerful capabilities.'

# Split the string into tokens and convert to token ids
tokens = tokenizer.encode(input_str)
print(f"Token IDs after tokenization: {tokens}.")
print(f"There are {len(tokens)} tokens after tokenization")

# Convert token ids back to strings and print them
for i in range(len(tokens)):
    print(f"Token ID {tokens[i]} corresponds to the string: {tokenizer.decode(tokens[i])}")
// Copyright (c) Alibaba, Inc. and its affiliates.
// dashscope SDK version >= 2.13.0
import java.util.List;
import com.alibaba.dashscope.exception.NoSpecialTokenExists;
import com.alibaba.dashscope.exception.UnSupportedSpecialTokenMode;
import com.alibaba.dashscope.tokenizers.Tokenizer;
import com.alibaba.dashscope.tokenizers.TokenizerFactory;

public class Main {
  public static void testEncodeOrdinary(){
    Tokenizer tokenizer = TokenizerFactory.qwen();
    String prompt ="If you were to walk 100,000 miles now, how long would it take to arrive? ";
    // encode string with no special tokens
    List<Integer> ids = tokenizer.encodeOrdinary(prompt);
    System.out.println(ids);
    String decodedString = tokenizer.decode(ids);
    assert decodedString == prompt;
  }

  public static void testEncode() throws NoSpecialTokenExists, UnSupportedSpecialTokenMode{
    Tokenizer tokenizer = TokenizerFactory.qwen();
    String prompt = "<|im_start|>system\nYour are a helpful assistant.<|im_end|>\n<|im_start|>user\nSanFrancisco is a<|im_end|>\n<|im_start|>assistant\n";
    // encode string with special tokens <|im_start|> and <|im_end|>
    List<Integer> ids = tokenizer.encode(prompt, "all");
    String decodedString = tokenizer.decode(ids);
    System.out.println(ids);
    assert decodedString == prompt;

  }

  public static void main(String[] args) {
      try {
        testEncodeOrdinary();
        testEncode();
      } catch (NoSpecialTokenExists | UnSupportedSpecialTokenMode e) {
        e.printStackTrace();
      }
  }
}

You can use this local tokenizer to estimate the token amount of your text, but the result may not be completely consistent with the actual server. If you are interested in the details of the Qwen tokenizer, see Tokenization.

How to view calling statistics?

You can check the call count and token consumption for a specific model on the Statistics page of the console.

How is multi-round conversation billed?

In multi-round conversations, the input and output from previous interactions are all billed as new input tokens.

I created an LLM application and never used it. Am I billed for the application?

No, you are not. Creating an application alone does not incur charges. You are only billed for model inference if you test or call the application.

How to pay?

If you encounter a balance shortage or overdue payment while using Model Studio, visit Expenses and Costs to pay.

How to set monthly consumption alert?

You can set quota alert in the Expenses and Costs center.

image

How to stop the pay-as-you-go billing?

You cannot stop pay-as-you-go billing. But as long as you stop using the features of Model Studio, you will not incur fees.

To prevent unexpected API invocation fees, you can delete all your API Key.

image

Additionally, you can set monthly consumption alert. You will be notified immediately in case of unexpected charges.

View the costs of Model Studio

  1. Go to the Cost Analysis page.

  2. Select Pretax Amount for Cost Type.

  3. Select Month for Time Unit.

  4. Select Alibaba Cloud Model Studio for Product Name.

image

View the costs of model inference

  1. Go to the Cost Analysis page.

  2. Select Pretax Amount for Cost Type.

  3. Choose a time range.

  4. Select Model Studio Foundation Model Inference for Product Detail.

image

View the inference costs of a specific model

  1. Go to the Billing Details tab of the Bill Details page.

  2. Select a Billing Cycle.

  3. Select Model Studio Foundation Model Inference for Product Details.

  4. Click Search. Take qwen-max as an example:

    In the Instance ID column, you can find the input_tokens and output_tokens instances of qwen-max. Add the amounts of the two instances to get the cost of calling the qwen-max model.

image

How to allocate costs based on payment details?

Bills generated after September 7, 2024, can be allocated based on: workspace ID, model name, input/output type, and calling channel.

  1. Go to the Billing Details tab of the Bill Details page.

  2. Select a Billing Cycle.

  3. Select Model Studio Foundation Model Inference for Product Details.

  4. Click Search.

  5. Click Export Billing Overview (CSV) to download the search results.

  6. Allocate the costs based on Instance ID.

    The Instance ID, such as text_token;llm-xxx;qwen-max;output_token;app, represents billing type;workspace ID;model name;input/output type;calling channel.

Calling channels include app, bmp, and assistant-api. app refers to model calls through applications, bmp to calls made on the Home or Playground pages of the console, and assistant-api to calls through the Assistant API.

image

API errors: Service activation or account balance

1. Service not activated

Use your Alibaba Cloud account to log on to Expenses and Costs. Activate Model Studio and claim the free quota.

image

2. Insufficient account balance

  • Check balance: Log on to Expenses and Costs. Check whether your account has sufficient balance.

  • Recharge: Click Recharge. Enter the desired amount and complete the payment.

3. Set consumption alert to prevent repeated errors

  • On this page (1)
  • Billing overview
  • Billable items
  • Model inference (calling)
  • Model inference (calling)
  • Overview & free quota
  • Free quota
  • Flagship models
  • Batch discount
  • Context cache
  • FAQ
Feedback
phone Contact Us

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

alicare alicarealicarealicare