All Products
Search
Document Center

Alibaba Cloud Model Studio:Billing

Last Updated:Sep 02, 2024

Qwen

Billing unit

Model service

Billing unit

Qwen-110B

token

Qwen-72B

token

Qwen-32B

token

Qwen-14B

token

Qwen-7B

token

Note

A token is the basic unit used by models to represent text in natural languages. A token can be considered as a character or word. In most cases, a token corresponds to a character in Chinese text or three to four letters in English text.

Qwen calculates the number of input and output tokens consumed by a model call and generates bills based on the number of consumed tokens. The number of input tokens is also calculated based on the historical conversations of multi-round conversations. You can obtain the number of tokens consumed during a model call from the response returned by the model.

Unit price

Model

Name

Unit price of input tokens

Unit price of output tokens

Billing method

Release date

Qwen-110B

qwen1.5-110b-chat

$0.004/1000 tokens

$0.012/1000 tokens

Pay-as-you-go

Released

Qwen-72B

qwen1.5-72b-chat

$0.003/1000 tokens

$0.009/1000 tokens

Pay-as-you-go

Released

Qwen-32B

qwen1.5-32b-chat

Free for a limited period

Free for a limited period

Pay-as-you-go

Released

Qwen-14B

qwen1.5-14b-chat

Free for a limited period

Free for a limited period

Pay-as-you-go

Released

Qwen-7B

qwen1.5-7b-chat

Free for a limited period

Free for a limited period

Pay-as-you-go

Released

Throttling thresholds

To ensure fair access to models, Qwen sets throttling thresholds for regular users. If a throttling threshold is reached, your API request for a model fails due to throttling. You must wait for a period of time until your usage falls back within the throttling threshold before you can call the model again.

Important

Throttling is model-specific and is associated with the Alibaba Cloud account from which a model is called. Throttling is applied based on the total number of calls that are initiated to the model by using all API keys within the Alibaba Cloud account.

If you want to increase the throttling thresholds for a model, click the link in the following table to submit a request.

Model

Name

Throttling threshold

Qwen-110B

qwen1.5-110b-chat

The throttling policy may change when time-limited free quotas are available. Throttling is triggered if one of the following throttling thresholds is reached:

  • Number of request ≤ 10 per minute.

  • Token consumption ≤ 20,000 per minute.

Qwen-72B

qwen1.5-72b-chat

Throttling is triggered if one of the following throttling thresholds is reached:

  • Number of request ≤ 60 per minute.

  • Token consumption ≤ 100,000 per minute.

Qwen-32B

qwen1.5-32b-chat

Throttling is triggered if one of the following throttling thresholds is reached:

  • Number of request ≤ 10 per minute.

  • Token consumption ≤ 20,000 per minute.

Qwen-14B

qwen1.5-14b-chat

Throttling is triggered if one of the following throttling thresholds is reached:

  • Number of request ≤ 60 per minute.

  • Token consumption ≤ 100,000 per minute.

Qwen-7B

qwen1.5-7b-chat

Throttling is triggered if one of the following throttling thresholds is reached:

  • Number of request ≤ 120 per minute.

  • Token consumption ≤ 200,000 per minute.