Alibaba Cloud Model Studio incurs fees on model calling for various tasks such as text generation.
Billable items
Item | Description | Method | Formula |
Model inference | Fees for calling models. Scenarios include: Direct model calling and Testing or calling applications. | Pay-as-you-go | Model inference fees = model usage × unit price Within the free quota, no fees will be incurred. For more information, see Free quota for new users. |
Unit price
Special reminder: Product pricing may change at any time, and the final price is subject to the bill.
Model inference
The following table lists only the unit prices of flagship models. For the unit prices and free quotas of all models, go to Models.
Flagship models | Qwen-Max | Qwen-Plus | Qwen-Turbo |
Model name for API calls | qwen-max | qwen-plus | qwen-turbo |
Input unit price (1,000 tokens) | $0.0100 | $0.0030 | $0.0004 |
Output unit price (1,000 tokens) | $0.0300 | $0.0090 | $0.0012 |
FAQ
Model inference
How to allocate costs based on payment details? For example, by workspaces, API keys, or model names.
Bills generated after September 7, 2024, can be allocated based on instance ID displayed on the Payment Details page. The instance ID includes: API key ID, workspace ID, model name, input/output type, and calling channel. You can allocate costs based on types.
You can go to the API Key Management page to view the ID of API keys.
If the instance ID does not contain API key ID, this fee is generated by calls made on the console.
How to convert between tokens and strings?
Tokens are the basic units used by models to represent natural language text, which can be understood as "characters" or "words".
For English text, one token usually corresponds to three to four letters or one word. For example, "Nice to meet you." is converted into ['Nice', ' to', ' meet', ' you', '.'].
For Chinese text, one token usually corresponds to one character or word. For example, "你好,我是通义千问" is converted into ['你好', ',', '我是', '通', '义', '千', '问'].
Different models may have different tokenization methods. You can use the SDK to view the tokenization data of the Qwen model.
You can use this local tokenizer to estimate the token amount of your text, but the result obtained is not guaranteed to be completely consistent with the actual server, and is for reference only. If you are interested in the details of the Qwen tokenizer, please refer to: Tokenizer reference.
How is multi-round conversation billed?
In multi-round conversations, conversation history is also billed as input.
I create an LLM application and never use it. Am I billed for the application?
No. Creating an application alone does not incur charges. You are only billed for model inference if you test or call the application.
View bill
How to view the cost of Alibaba Cloud Model Studio last month?
Go to the Cost Analysis page.
Select Pretax Amount for Cost Type.
Select Month for Time Unit and select the last month.
Select Alibaba Cloud Model Studio for Product Name.
How to view the total cost of model calling?
Go to the Cost Analysis page.
Select Pretax Amount for Cost Type.
Choose a time range.
Select Model Studio Foundation Model Inference for Product Detail.
How to view the cost of calling the qwen-max model?
Go to the Billing Details tab of the Bill Details page.
Select a Billing Cycle.
Select Model Studio Foundation Model Inference for Product Details.
Click Search.
In the Instance ID column, you can find the input_tokens and output_tokens instances of qwen-max. Add the amounts of the two instances to get the cost of calling the qwen-max model.