Model inference billing description - Alibaba Cloud Model Studio

Alibaba Cloud Model Studio incurs fees on model calling for various tasks such as text generation.

Billable items

Item

Description

Method

Formula

Model inference

Fees for calling models.

Scenarios include: Direct model calling and Testing or calling applications.

Pay-as-you-go

Model inference fees = model usage × unit price

Within the free quota, no fees will be incurred. For more information, see Free quota for new users.

Unit price

Special reminder: Product pricing may change at any time, and the final price is subject to the bill.

Model inference

The following table lists only the unit prices of flagship models. For the unit prices and free quotas of all models, go to Models.

Flagship models	Qwen-Max	Qwen-Plus	Qwen-Turbo
Model name for API calls	qwen-max	qwen-plus	qwen-turbo
Maximum context (Tokens)	8,000	131,072	8,000
Input unit price (1,000 tokens)	$0.0100	$0.0030	$0.0004
Output unit price (1,000 tokens)	$0.0300	$0.0090	$0.0012

FAQ

Model inference

How to allocate costs based on payment details? For example, by workspaces, API keys, or model names.

Bills generated after September 7, 2024, can be allocated based on instance ID displayed on the Payment Details page. The instance ID includes: API key ID, workspace ID, model name, input/output type, and calling channel. You can allocate costs based on types.

You can go to the API Key Management page to view the ID of API keys.

If the instance ID does not contain API key ID, this fee is generated by calls made on the console.

How to convert between tokens and strings?

Tokens are the basic units used by models to represent natural language text, which can be understood as "characters" or "words".

For English text, one token usually corresponds to three to four letters or one word. For example, "Nice to meet you." is converted into ['Nice', ' to', ' meet', ' you', '.'].
For Chinese text, one token usually corresponds to one character or word. For example, "你好，我是通义千问" is converted into ['你好', '，', '我是', '通', '义', '千', '问'].

Different models may have different tokenization methods. You can use the SDK to view the tokenization data of the Qwen model.

View the tokenization data of Qwen

Python

# Before running, please pip install tiktoken
from dashscope import get_tokenizer  # dashscope version >= 1.14.0

# Get the tokenizer object, currently only supports the Qwen series models
tokenizer = get_tokenizer('qwen-turbo')

input_str = 'Nice to meet you.'

# Split the string into tokens and convert to token ids
tokens = tokenizer.encode(input_str)
print(f"Token ids after tokenization: {tokens}.")
print(f"There are {len(tokens)} tokens after tokenization")

# Convert token ids back to strings and print them
for i in range(len(tokens)):
    print(f"Token id {tokens[i]} corresponds to the string: {tokenizer.decode(tokens[i])}")

Java

// Copyright (c) Alibaba, Inc. and its affiliates.
// dashscope SDK version >= 2.13.0
import java.util.List;
import com.alibaba.dashscope.exception.NoSpecialTokenExists;
import com.alibaba.dashscope.exception.UnSupportedSpecialTokenMode;
import com.alibaba.dashscope.tokenizers.Tokenizer;
import com.alibaba.dashscope.tokenizers.TokenizerFactory;

public class Main {
  public static void testEncodeOrdinary(){
    Tokenizer tokenizer = TokenizerFactory.qwen();
    String prompt ="Nice to meet you.";
    // encode string with no special tokens
    List<Integer> ids = tokenizer.encodeOrdinary(prompt);
    System.out.println(ids);
    String decodedString = tokenizer.decode(ids);
    assert decodedString == prompt;
  }

  public static void testEncode() throws NoSpecialTokenExists, UnSupportedSpecialTokenMode{
    Tokenizer tokenizer = TokenizerFactory.qwen();
    String prompt = "<|im_start|>system\nYour are a helpful assistant.<|im_end|>\n<|im_start|>user\nSanFrancisco is a<|im_end|>\n<|im_start|>assistant\n";
    // encode string with special tokens <|im_start|> and <|im_end|>
    List<Integer> ids = tokenizer.encode(prompt, "all");
    String decodedString = tokenizer.decode(ids);
    System.out.println(ids);
    assert decodedString == prompt;

  }

  public static void main(String[] args) {
      try {
        testEncodeOrdinary();
        testEncode();
      } catch (NoSpecialTokenExists | UnSupportedSpecialTokenMode e) {
        e.printStackTrace();
      }
  }
}

You can use this local tokenizer to estimate the token amount of your text, but the result obtained is not guaranteed to be completely consistent with the actual server, and is for reference only. If you are interested in the details of the Qwen tokenizer, please refer to: Tokenizer reference.

How is multi-round conversation billed?

In multi-round conversations, conversation history is also billed as input.

I create an LLM application and never use it. Am I billed for the application?

No. Creating an application alone does not incur charges. You are only billed for model inference if you test or call the application.

View bill

How to view the cost of Alibaba Cloud Model Studio last month?

Go to the Cost Analysis page.
Select Pretax Amount for Cost Type.
Select Month for Time Unit and select the last month.
Select Alibaba Cloud Model Studio for Product Name.

How to view the total cost of model calling?

Go to the Cost Analysis page.
Select Pretax Amount for Cost Type.
Choose a time range.
Select Model Studio Foundation Model Inference for Product Detail.

How to view the cost of calling the qwen-max model?

Go to the Billing Details tab of the Bill Details page.
Select a Billing Cycle.
Select Model Studio Foundation Model Inference for Product Details.
Click Search.
In the Instance ID column, you can find the input_tokens and output_tokens instances of qwen-max. Add the amounts of the two instances to get the cost of calling the qwen-max model.