Alibaba Cloud Model Studio provides an OpenAI-compatible Batch API that lets you submit batch files for asynchronous execution. This service processes large-scale data offline during off-peak hours and returns the results after a task is complete or the maximum waiting time is reached, at 50% of the cost of real-time calls.
For console operation, see Batch processing.
Workflow
Prerequisites
Use the OpenAI SDK (Python, Node.js) or the HTTP API.
Get an API key: Get an API key and export it as environment variable
Install the SDK (optional): If you use an SDK, install the OpenAI SDK.
Service endpoints
Mainland China:
https://dashscope.aliyuncs.com/compatible-mode/v1International:
https://dashscope-intl.aliyuncs.com/compatible-mode/v1
Scope
Mainland China
In the Mainland China deployment mode, endpoints and data storage are located in the Beijing region, and model inference compute resources are limited to Mainland China.
Supported models:
Text generation models: Stable and some
latestversions of Qwen-Max, Qwen-Plus, Qwen-Flash, and Qwen-Long. Also supports the QwQ series (qwq-plus) and third-party models such as deepseek-r1 and deepseek-v3.Multimodal models: Stable and some
latestversions of Qwen-VL-Max, Qwen-VL-Plus, and Qwen-VL-Flash. Also Qwen-OCR.Text embedding models: text-embedding-v4.
International
In the international deployment mode, endpoints and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Mainland China.
Supported models: qwen-max, qwen-plus, qwen-flash, and qwen-turbo.
Getting started
Before you process formal tasks, use the test model batch-test-model for end-to-end testing. The test model skips the inference process and returns a fixed successful response to verify that the API call chain and data format are correct.
Limitations of batch-test-model:
The test file must meet the requirements that are described in Input file. The file size cannot exceed 1 MB, and the number of lines cannot exceed 100.
Concurrency limit: Up to 2 parallel tasks.
Cost: The test model does not incur any model inference fees.
Step 1: Prepare the input file
Prepare a file named test_model.jsonl with the following content:
{"custom_id":"1","method":"POST","url":"/v1/chat/ds-test","body":{"model":"batch-test-model","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello! How can I help you?"}]}}
{"custom_id":"2","method":"POST","url":"/v1/chat/ds-test","body":{"model":"batch-test-model","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"}]}}Step 2: Run the code
Choose one of the following code samples based on your programming language. Save it in the same directory as your input file and run it. The code completes the entire workflow, including file upload, task creation, status polling, and result download.
To adjust the file path or other parameters, modify the code as needed.
Step 3: Verify test results
After the task succeeds, the result file result.jsonl will contain the fixed response {"content":"This is a test result."}:
{"id":"a2b1ae25-21f4-4d9a-8634-99a29926486c","custom_id":"1","response":{"status_code":200,"request_id":"a2b1ae25-21f4-4d9a-8634-99a29926486c","body":{"created":1743562621,"usage":{"completion_tokens":6,"prompt_tokens":20,"total_tokens":26},"model":"batch-test-model","id":"chatcmpl-bca7295b-67c3-4b1f-8239-d78323bb669f","choices":[{"finish_reason":"stop","index":0,"message":{"content":"This is a test result."}}],"object":"chat.completion"}},"error":null}
{"id":"39b74f09-a902-434f-b9ea-2aaaeebc59e0","custom_id":"2","response":{"status_code":200,"request_id":"39b74f09-a902-434f-b9ea-2aaaeebc59e0","body":{"created":1743562621,"usage":{"completion_tokens":6,"prompt_tokens":20,"total_tokens":26},"model":"batch-test-model","id":"chatcmpl-1e32a8ba-2b69-4dc4-be42-e2897eac9e84","choices":[{"finish_reason":"stop","index":0,"message":{"content":"This is a test result."}}],"object":"chat.completion"}},"error":null}Run a formal task
Input file requirements
Format: JSONL with UTF-8 encoding. Each line must be a separate JSON object.
Size limits: Up to 50,000 requests per file and no larger than 500 MB.
Line limit: Each JSON object up to 6 MB and within the model's context window.
Consistency: All requests in the same file must use the same model and, if applicable, the same thinking mode.
Unique identifier: Each request must include a `custom_id` field that is unique within the file. This is used for matching results.
Example file content:
{"custom_id":"1","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-plus","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello!"}]}}
{"custom_id":"2","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-plus","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"}]}}JSONL batch generation tool
Use this tool to quickly generate JSONL files.
1. Modify the input file
Directly modify
test_model.jsonl. Set the model parameter to the actual model to use and configure the url field:Type
url
Text generation/multimodal
/v1/chat/completionsText embedding
/v1/embeddingsAlternatively, use the "JSONL batch generation tool" above to generate a new file. Make sure that the
modelandurlfields are correct.
2. Modify the code in Getting started
Change the input file path to your file name.
Modify the endpoint parameter value to match the url field in your input file.
3. Run the code and wait for results
After the task succeeds, the successful request results are saved in the local result.jsonl file. If some requests fail, the error details are saved in the error.jsonl file.
Successful results (
output_file_id): Each line corresponds to a successful original request and contains thecustom_idandresponse.{"id":"3a5c39d5-3981-4e4c-97f2-e0e821893f03","custom_id":"req-001","response":{"status_code":200,"request_id":"3a5c39d5-3981-4e4c-97f2-e0e821893f03","body":{"created":1768306034,"usage":{"completion_tokens":654,"prompt_tokens":14,"total_tokens":668},"model":"qwen-plus","id":"chatcmpl-3a5c39d5-3981-4e4c-97f2-e0e821893f03","choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"Hello! Hangzhou West Lake is a famous scenic spot in China, located in the western part of Hangzhou City, Zhejiang Province, hence the name \"West Lake\". It is one of China's top ten scenic spots and a World Cultural Heritage site (listed by UNESCO in 2011). It is renowned worldwide for its beautiful natural scenery and profound cultural heritage.\n\n### I. Natural Landscape\nWest Lake is surrounded by mountains on three sides and borders the city on one side, covering an area of approximately 6.39 square kilometers, shaped like a ruyi scepter with rippling blue waters. The lake is naturally or artificially divided into multiple water areas by Solitary Hill, Bai Causeway, Su Causeway, and Yanggong Causeway, forming a layout of \"one mountain, two pagodas, three islands, and three causeways\".\n\nMain attractions include the following:\n- **Spring Dawn at Su Causeway**: During the Northern Song Dynasty, the great literary figure Su Dongpo, while serving as the prefect of Hangzhou, led the dredging of West Lake and used the excavated silt to build a causeway, later named \"Su Causeway\". In spring, peach blossoms and willows create a picturesque scene.\n- **Lingering Snow on Broken Bridge**: Located at the eastern end of Bai Causeway, this is where the reunion scene from the Legend of the White Snake took place. After snowfall in winter, it is particularly famous for its silver-white appearance.\n- **Leifeng Pagoda at Sunset**: Leifeng Pagoda glows golden under the setting sun and was once one of the \"Ten Scenes of West Lake\".\n- **Three Pools Mirroring the Moon**: On Xiaoyingzhou Island in the lake, there are three stone pagodas. During the Mid-Autumn Festival, lanterns can be lit inside the pagodas, creating a harmonious interplay of moonlight, lamplight, and lake reflections.\n- **Autumn Moon over Calm Lake**: Located at the western end of Bai Causeway, it is an excellent spot for viewing the moon over the lake.\n- **Viewing Fish at Flower Harbor**: Known for viewing flowers and fish, with peonies and koi complementing each other beautifully in the garden.\n\n### II. Cultural History\nWest Lake not only boasts beautiful scenery but also carries rich historical and cultural significance:\n- Since the Tang and Song dynasties, numerous literati such as Bai Juyi, Su Dongpo, Lin Bu, and Yang Wanli have left poems here.\n- Bai Juyi oversaw the construction of \"Bai Causeway\" and dredged West Lake, benefiting the local people.\n- Around West Lake are many historical sites, including Yuewang Temple (commemorating national hero Yue Fei), Lingyin Temple (a millennium-old Buddhist temple), Liuhe Pagoda, and Longjing Village (the origin of Longjing tea, one of China's top ten famous teas).\n\n### III. Cultural Symbolism\nWest Lake is regarded as a representative of \"paradise on earth\" and a model of traditional Chinese landscape aesthetics. It embodies the philosophical concept of \"harmony between heaven and humanity\" by integrating natural beauty with cultural depth. Many poems, paintings, and operas feature West Lake, making it an important symbol of Chinese culture.\n\n### IV. Travel Recommendations\n- Best visiting seasons: Spring (March-May) for peach blossoms and willows, Autumn (September-November) for clear skies and cool weather.\n- Recommended ways: Walking, cycling (along the lakeside greenway), or boating on the lake.\n- Local cuisine: West Lake vinegar fish, Longjing shrimp, Dongpo pork, pian'erchuan noodles.\n\nIn summary, Hangzhou West Lake is not just a natural wonder but also a living cultural museum worth exploring in detail. If you ever visit Hangzhou, don't miss this earthly paradise that is \"equally charming in light or heavy makeup\"."}}],"object":"chat.completion"}},"error":null} {"id":"628312ba-172c-457d-ba7f-3e5462cc6899","custom_id":"req-002","response":{"status_code":200,"request_id":"628312ba-172c-457d-ba7f-3e5462cc6899","body":{"created":1768306035,"usage":{"completion_tokens":25,"prompt_tokens":18,"total_tokens":43},"model":"qwen-plus","id":"chatcmpl-628312ba-172c-457d-ba7f-3e5462cc6899","choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"The spring breeze brushes green willows,\nNight rain nourishes red flowers.\nBird songs fill the forest,\nMountains and rivers share the same beauty."}}],"object":"chat.completion"}},"error":null}Failure details (
error_file_id): Contains information about failed request lines and the corresponding error reasons. For troubleshooting, see Error codes.
Detailed procedure
The Batch API process consists of four steps: uploading a file, creating a task, querying the task status, and downloading the results.
1. Upload file
2. Create a batch task
3. Query and manage batch tasks
4. Download Batch result file
Advanced features
Set completion notifications
For long-running tasks, polling consumes unnecessary resources. Use an asynchronous notification mechanism where the system actively notifies you when the task completes.
Callback: Specify a publicly accessible URL when creating the task.
EventBridge message queue: Deeply integrated with the Alibaba Cloud ecosystem and requires no public IP.
Method 1: Callback
Method 2: EventBridge message queue
Going live
File management
Periodically call the File delete API to delete unnecessary files and avoid reaching the file storage limit (10,000 files or 100 GB).
Store large files in OSS instead.
Task monitoring
Prioritize using Callback or EventBridge asynchronous notifications.
If polling is required, set the interval to more than 1 minute and use an exponential backoff strategy.
Error handling
Implement a complete exception handling mechanism that covers network errors, API errors, and other potential issues.
Download and analyze the error details in
error_file_id.For common error codes, see Error messages for solutions.
Cost optimization
Combine small tasks into a single batch.
Set
completion_windowappropriately to provide greater scheduling flexibility.
Utility tools
CSV to JSONL
JSONL results to CSV
Rate limits
Interface | Rate limit (per Alibaba Cloud account) |
Create task | 1,000 calls/minute, up to 1,000 concurrent tasks |
Query task | 1,000 calls/minute |
Query task list | 100 calls/minute |
Cancel task | 1,000 calls/minute |
Billing
Unit price: For successful requests, the unit price for both input and output tokens is 50% of the real-time price for the corresponding model. See the Model list.
Scope:
Only successfully executed requests in a task are billed.
File parsing failures, task execution failures, or row-level error requests do not incur fees.
For canceled tasks, any requests that were successfully completed before the cancellation are billed.
Batch inference is a separate billable item and does not support subscription (savings plans), new user free quotas, or features such as context cache.
Error codes
If a call fails and returns an error message, see Error messages for a solution.
FAQ
How is batch calling billed? Do I need to purchase separately?
You are billed for the tokens used by successful requests in a task. No separate package purchase is required.
Are submitted batch tasks executed in order?
No, they are not. The backend uses a dynamic scheduling mechanism to schedule task execution based on the overall compute resource load. Therefore, execution is not guaranteed to be in the order of submission. If resources are limited, task startup and execution may be delayed.
How long does it take to complete a submitted batch task?
The execution time depends on system resource allocation and the scale of your task. If a task does not complete within the specified completion_window, it expires. Unprocessed requests in an expired task are not executed and do not incur any fees.
Scenario recommendations: For scenarios that require real-time model inference, we recommend using real-time calls. For large-scale data processing scenarios that are not time-sensitive, we recommend using batch calls.