Billing for model services

0.0.201

Billing overview

You are not charged for activating Alibaba Cloud Model Studio. But you are charged for model inference when using large language models (LLMs) for text generation.

Billable items

Model inference (calling)

Method

Pay-as-you-go

Formula

Fee = Consumption × Unit price

Description

Free quota: View remaining quota

Unit price: View prices

Model inference (calling)

Overview & free quota

For a complete list of prices and free quotas, see Models. For detailed performance information, see Throttling.

You can view the number of calls and token consumption for a specific model in Model Studio console - Statistics.

Note

Learn about how to claim free quota and view remaining quota.

Free quota

For information about how to obtain free quota and check remaining free quota, see Free quota for new users.

Flagship models

For prices and free quotas of other models, see Models.

Flagship models	Qwen-Max Best inference performance	Qwen-Plus Balanced performance, speed and cost	Qwen-Turbo Fast speed and low cost
Maximum context (Tokens)	32,768	131,072	1,008,192
Minimum input price (1,000 tokens)	$0.0016	$0.0004	$0.00005
Minimum output price (1,000 tokens)	$0.0064	$0.0012	$0.0002

Batch discount

The text generation models qwen-max, qwen-plus, and qwen-turbo support batch calling. The cost for batch calling is 50% of the price for real-time calling. However, batch requests are not eligible for other types of discounts, such as free quota or context cache.

You can use API to submit batch tasks as files for asynchronous execution. The service processes large-scale data offline during non-peak hours and delivers results upon task completion or when the maximum wait time is reached.

Context cache

Enabling context cache does not require additional payment. If the system determines that your request hits the cache, the hit tokens will be charged as cached_token. The tokens that are not hit will be charged as input_token. The unit price of cached_token is 40% of the unit price of input_token.

output_token is charged at the original price.

The cached_tokens property of the return result indicates the number of tokens that hit the cache.

If you use the OpenAI compatible - Batch mode, the discount of context cache is not available.

For more information, see Context Cache.

FAQ

Billing rules

Cost management

About bills

About API

How to calculate token count?

Tokens are the basic units used by models to represent natural language text, which can be understood as "characters" or "words".

For Chinese text, 1 token usually corresponds to 1 Chinese character or word. For example, "你好，我是通义千问" will be converted to ['你好', '，', '我是', '通', '义', '千', '问'].
For English text, 1 token usually corresponds to 3-4 letters or 1 word. For example, "Nice to meet you." will be converted to ['Nice', ' to', ' meet', ' you', '.'].

Different models may have different tokenization methods. You can use the SDK to view the tokenization data of the Qwen model locally.

Python

Java

Python

            
            
          
# Before running: pip install dashscope
from dashscope import get_tokenizer

# Get the tokenizer object, currently only supports the Qwen series models
tokenizer = get_tokenizer('qwen-turbo')

input_str = 'Qwen has powerful capabilities.'

# Split the string into tokens and convert to token ids
tokens = tokenizer.encode(input_str)
print(f"Token IDs after tokenization: {tokens}.")
print(f"There are {len(tokens)} tokens after tokenization")

# Convert token ids back to strings and print them
for i in range(len(tokens)):
    print(f"Token ID {tokens[i]} corresponds to the string: {tokenizer.decode(tokens[i])}")

Java

            
            
          
// Copyright (c) Alibaba, Inc. and its affiliates.
// dashscope SDK version >= 2.13.0
import java.util.List;
import com.alibaba.dashscope.exception.NoSpecialTokenExists;
import com.alibaba.dashscope.exception.UnSupportedSpecialTokenMode;
import com.alibaba.dashscope.tokenizers.Tokenizer;
import com.alibaba.dashscope.tokenizers.TokenizerFactory;

public class Main {
  public static void testEncodeOrdinary(){
    Tokenizer tokenizer = TokenizerFactory.qwen();
    String prompt ="If you were to walk 100,000 miles now, how long would it take to arrive? ";
    // encode string with no special tokens
    List<Integer> ids = tokenizer.encodeOrdinary(prompt);
    System.out.println(ids);
    String decodedString = tokenizer.decode(ids);
    assert decodedString == prompt;
  }

  public static void testEncode() throws NoSpecialTokenExists, UnSupportedSpecialTokenMode{
    Tokenizer tokenizer = TokenizerFactory.qwen();
    String prompt = "<|im_start|>system\nYour are a helpful assistant.<|im_end|>\n<|im_start|>user\nSanFrancisco is a<|im_end|>\n<|im_start|>assistant\n";
    // encode string with special tokens <|im_start|> and <|im_end|>
    List<Integer> ids = tokenizer.encode(prompt, "all");
    String decodedString = tokenizer.decode(ids);
    System.out.println(ids);
    assert decodedString == prompt;

  }

  public static void main(String[] args) {
      try {
        testEncodeOrdinary();
        testEncode();
      } catch (NoSpecialTokenExists | UnSupportedSpecialTokenMode e) {
        e.printStackTrace();
      }
  }
}

You can use this local tokenizer to estimate the token amount of your text, but the result may not be completely consistent with the actual server. If you are interested in the details of the Qwen tokenizer, see Tokenization.

How to view calling statistics?

You can check the call count and token consumption for a specific model on the Statistics page of the console.

How is multi-round conversation billed?

In multi-round conversations, the input and output from previous interactions are all billed as new input tokens.

I created an LLM application and never used it. Am I billed for the application?

No, you are not. Creating an application alone does not incur charges. You are only billed for model inference if you test or call the application.

How to pay?

If you encounter a balance shortage or overdue payment while using Model Studio, visit Expenses and Costs to pay.

How to set monthly consumption alert?

You can set quota alert in the Expenses and Costs center.

How to stop the pay-as-you-go billing?

You cannot stop pay-as-you-go billing. But as long as you stop using the features of Model Studio, you will not incur fees.

To prevent unexpected API invocation fees, you can delete all your API Key.

Additionally, you can set monthly consumption alert. You will be notified immediately in case of unexpected charges.

View the costs of Model Studio

Go to the Cost Analysis page.
Select Pretax Amount for Cost Type.
Select Month for Time Unit.
Select Alibaba Cloud Model Studio for Product Name.

View the costs of model inference

Go to the Cost Analysis page.
Select Pretax Amount for Cost Type.
Choose a time range.
Select Model Studio Foundation Model Inference for Product Detail.

View the inference costs of a specific model

Go to the Billing Details tab of the Bill Details page.
Select a Billing Cycle.
Select Model Studio Foundation Model Inference for Product Details.
Click Search. Take qwen-max as an example:
In the Instance ID column, you can find the input_tokens and output_tokens instances of qwen-max. Add the amounts of the two instances to get the cost of calling the qwen-max model.

How to allocate costs based on payment details?

Bills generated after September 7, 2024, can be allocated based on: workspace ID, model name, input/output type, and calling channel.

Go to the Billing Details tab of the Bill Details page.
Select a Billing Cycle.
Select Model Studio Foundation Model Inference for Product Details.
Click Search.
Click Export Billing Overview (CSV) to download the search results.
Allocate the costs based on Instance ID.
The Instance ID, such as text_token;llm-xxx;qwen-max;output_token;app, represents billing type;workspace ID;model name;input/output type;calling channel.

Calling channels include app, bmp, and assistant-api. app refers to model calls through applications, bmp to calls made on the Home or Playground pages of the console, and assistant-api to calls through the Assistant API.

API errors: Service activation or account balance

1. Service not activated

Use your Alibaba Cloud account to log on to Expenses and Costs. Activate Model Studio and claim the free quota.

2. Insufficient account balance

Check balance: Log on to Expenses and Costs. Check whether your account has sufficient balance.
Recharge: Click Recharge. Enter the desired amount and complete the payment.

3. Set consumption alert to prevent repeated errors

Set alert: Set monthly consumption alert. The system will send an alert when triggered.

Feedback

Previous: BillingNext: Free quota for new users

On this page （1）

Billing overview

Billable items

Model inference (calling)

Overview & free quota

Free quota

Flagship models

Batch discount

Context cache

FAQ

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

Billing overview

Billable items

Model inference (calling)

Model inference (calling)

Overview & free quota

Free quota

Flagship models

Batch discount

Context cache

FAQ

How to calculate token count?

How to view calling statistics?

How is multi-round conversation billed?

I created an LLM application and never used it. Am I billed for the application?

How to pay?

How to set monthly consumption alert?

How to stop the pay-as-you-go billing?

View the costs of Model Studio

View the costs of model inference

View the inference costs of a specific model

How to allocate costs based on payment details?

API errors: Service activation or account balance

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Lingma

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)