All Products
Search
Document Center

:Billing

Last Updated:Jul 15, 2024

You are charged fees when you call the API operations of Qwen. You can call Qwen models within throttling thresholds. This topic describes the billing unit, unit prices, free quotas, and throttling thresholds of Qwen models.

Qwen

Billing unit

Model service

Billing unit

Qwen

Token

Note

A token is the basic unit used by models to represent text in natural languages. A token can be considered as a character or letter. In most cases, a token corresponds to a character in Chinese text or three to four letters in English text.

Qwen calculates the number of input and output tokens consumed by a model call and generates bills based on the number of consumed tokens. The number of input tokens is also calculated based on the historical conversations of multi-round conversations. You can obtain the number of tokens consumed during a model call from the response returned by the model.

Convert strings into tokens and convert tokens back into strings

Different Qwen models may split text into tokens by using different methods. You can use an SDK to view the number of tokens that are converted by a Qwen model from text on your computer.

# Before you run the sample code, run the pip install tiktoken command.
from dashscope import get_tokenizer  # Make sure that DashScope SDK for Python V1.14.0 or later is used.

# Obtain the tokenizer object. Only Qwen models are supported.
tokenizer = get_tokenizer('qwen-turbo')

input_str = 'Qwen provides powerful capabilities. '

# Split the string into tokens and convert the tokens into token IDs.
tokens = tokenizer.encode(input_str)
print(f"IDs of tokens split from the string: {tokens}")
print(f"Total tokens: {len(tokens)}")

# Convert the token IDs into a string and display the string.
for i in range(len(tokens)):
    print(f"String converted from token IDs {tokens[i]}: {tokenizer.decode(tokens[i])}")
// Copyright (c) Alibaba, Inc. and its affiliates.
// Make sure that DashScope SDK for Java V2.13.0 or later is used.
import java.util.List;
import com.alibaba.dashscope.exception.NoSpecialTokenExists;
import com.alibaba.dashscope.exception.UnSupportedSpecialTokenMode;
import com.alibaba.dashscope.tokenizers.Tokenizer;
import com.alibaba.dashscope.tokenizers.TokenizerFactory;

public class Main {
  public static void testEncodeOrdinary(){
    Tokenizer tokenizer = TokenizerFactory.qwen();
    String prompt ="How long does it take for you to walk a thousand miles?  ";
    // Encode the string with no special tokens.
    List<Integer> ids = tokenizer.encodeOrdinary(prompt);
    System.out.println(ids);
    String decodedString = tokenizer.decode(ids);
    assert decodedString == prompt;
  }

  public static void testEncode() throws NoSpecialTokenExists, UnSupportedSpecialTokenMode{
    Tokenizer tokenizer = TokenizerFactory.qwen();
    String prompt = "<|im_start|>system\nYour are a helpful assistant.<|im_end|>\n<|im_start|>user\nSanFrancisco is a<|im_end|>\n<|im_start|>assistant\n";
    // Encode the string with special tokens <|im_start|> and <|im_end|>.
    List<Integer> ids = tokenizer.encode(prompt, "all");
    // 24 tokens [151644, 8948, 198, 7771, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 23729, 80328, 9464, 374, 264, 151645, 198, 151644, 77091, 198]
    String decodedString = tokenizer.decode(ids);
    System.out.println(ids);
    assert decodedString == prompt;

  }

  public static void main(String[] args) {
      try {
        testEncodeOrdinary();
        testEncode();
      } catch (NoSpecialTokenExists | UnSupportedSpecialTokenMode e) {
        e.printStackTrace();
      }
  }
}

You can use the tokenizer that runs on your computer to estimate the number of tokens that are converted from text. However, the number of tokens converted by the tokenizer is for reference only and may be inconsistent with that consumed by a Qwen model. For more information about the tokenizer of Qwen, visit GitHub.

Unit prices

Model service

Model name

Unit price of input tokens

Unit price of output tokens

Qwen

qwen-turbo

$0.0004/1000 tokens

$0.0012/1000 tokens

qwen-plus

$0.0030/1000 tokens

$0.0090/1000 tokens

qwen-max

$0.0100/1000 tokens

$0.0300/1000 tokens

Throttling thresholds

To ensure fair access to models, Qwen sets throttling thresholds for regular users. Throttling is model-specific and is associated with the Alibaba Cloud account from which a model is called. Throttling is applied based on the total number of calls that are initiated to the model by using all API keys within the Alibaba Cloud account. If a throttling threshold is reached, your API request for a model fails due to throttling. You must wait for a period of time until your usage falls back within the throttling threshold before you can call the model again.

Model service

Model name

Throttling threshold

qwen-max

The throttling policy may change when time-limited free quotas are available. Throttling is triggered if one of the following throttling thresholds is reached:

  • Call frequency ≤ 20 QPM: No more than 20 API calls are initiated per minute.

  • Token consumption ≤ 34,000 TPM: No more than 100,000 tokens are consumed per minute.

qwen-turbo

Throttling is triggered if one of the following throttling thresholds is reached:

  • Call frequency ≤ 60QPM: No more than 60 API calls are initiated per minute.

  • Token consumption ≤ 62,500 tokens per minute (TPM): No more than 500,000 tokens are consumed per minute.

qwen-plus

Throttling is triggered if one of the following throttling thresholds is reached:

  • Call frequency ≤ 60 QPM: No more than 60 API calls are initiated per minute.

  • Token consumption ≤ 61,000 TPM: No more than 200,000 tokens are consumed per minute.