Voice design creates custom voices from text descriptions. It supports multilingual and multidimensional voice feature definitions. Voice design and speech synthesis are two sequential steps. This document covers voice design parameters and API details. For speech synthesis, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.
User guide: For model introductions and selection recommendations, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.
Language support
Voice design supports multilingual voice creation and speech synthesis for the following languages: Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru).
How to write high-quality voice descriptions?
Requirements and limitations
When writing a voice description (voice_prompt), follow these technical constraints:
Length limit: The
voice_promptcontent must not exceed 2,048 characters.Supported languages: Description text supports Chinese and English only.
Core principles
A high-quality voice description (voice_prompt) is key to creating your ideal voice. Think of it as the "blueprint" for voice design—it guides the model to generate voices with specific features.
Follow these core principles when describing a voice:
Be specific, not vague: Use words that describe concrete voice qualities, such as "deep," "crisp," or "fast-paced." Avoid subjective, low-information terms like "nice" or "normal."
Be multidimensional, not single-dimensional: Strong descriptions combine multiple dimensions (e.g., gender, age, emotion). A single-dimension description like "female voice" is too broad to produce a distinctive voice.
Be objective, not subjective: Focus on physical and perceptual voice features—not personal preferences. For example, use "high-pitched and energetic" instead of "my favorite voice."
Be original, not imitative: Describe voice qualities—not requests to mimic specific people (e.g., celebrities or actors). Such requests carry copyright risk and are not supported by the model.
Be concise, not redundant: Ensure every word adds meaning. Avoid repeating synonyms or meaningless intensifiers (e.g., "a very, very great voice").
Reference dimensions for descriptions
Dimension | Examples |
Gender | Male, female, neutral |
Age | Child (5–12), teenager (13–18), young adult (19–35), middle-aged (36–55), elderly (55+) |
Pitch | High, medium, low, high-pitched, low-pitched |
Pace | Fast, medium, slow, fast-paced, slow-paced |
Emotion | Cheerful, calm, gentle, serious, lively, composed, soothing |
Characteristics | Magnetic, crisp, hoarse, mellow, sweet, rich, powerful |
Purpose | News broadcast, ad voice-over, audiobook, animation character, voice assistant, documentary narration |
Example comparison
✅ Good cases
"A young, lively female voice with a fast pace and noticeable upward intonation, suitable for fashion product introductions." Combines age, personality, pace, intonation, and use case.
Analysis: Combines age, personality, pace, and intonation—and specifies the use case—creating a vivid, three-dimensional voice.
"A calm, middle-aged male voice with a slow pace and deep, magnetic tone, suitable for news or documentary narration." Defines gender, age range, pace, vocal characteristics, and application domain.
Analysis: Clearly defines gender, age group, pace, vocal traits, and application area.
"A cute child's voice, around 8 years old, with a slightly childish tone, suitable for animation character voice-overs." Specifies exact age and vocal quality ("childish") with clear purpose.
Analysis: Precisely identifies age and vocal trait ("childish"), with a well-defined goal.
A gentle, intellectual woman in her early 30s with a calm voice, ideal for audiobook narration.
Analysis: Words like "intellectual" and "calm" clearly communicate emotional tone and stylistic intent.
❌ Bad cases and improvement suggestions
Bad cases | Main issue | Improvement suggestion |
A nice voice | Too vague and subjective, lacking actionable features. | Add specific dimensions, e.g., "A young female voice with a clear vocal line and gentle tone." |
A voice like a certain celebrity | Involves copyright risk; the model cannot directly mimic celebrities. | Extract and describe voice traits, e.g., "A mature, magnetic male voice with a calm pace." |
A very, very, very nice female voice | Information redundancy; repetition does not help define voice quality. | Remove repeated words and add effective descriptors, e.g., "A female voice aged 20–24, with a light tone, lively pitch, and sweet quality." |
123456 | Invalid input; cannot be parsed into voice features. | Provide a meaningful text description using the recommended examples above. |
Getting started: From voice design to speech synthesis
1. Workflow
Voice design and speech synthesis are two sequential steps. Follow a create-then-use workflow:
Prepare the voice description and preview text for voice design.
Voice description (voice_prompt): Defines the target voice’s features (for how to write one, see "How to write high-quality voice descriptions?").
Preview text (preview_text): Text for the preview audio generated by the target voice (e.g., "Hello everyone, welcome to listen.").
Call the Create voice API to create a custom voice and get its name and preview audio.
You must set
target_modelto the speech synthesis model that drives this voice.Listen to the preview audio to check if it meets expectations. If satisfied, proceed to the next step. Otherwise, redesign.
If you already have a created voice (check via the List voices API), skip this step and go straight to the next.
Use the voice for speech synthesis.
Call the speech synthesis API and pass in the voice name obtained in the previous step. The speech synthesis model used here must match the
target_modelfrom the previous step.
2. Model configuration and preparations
Select appropriate models and complete preparations.
Model configuration
Specify the following two models for voice design:
Voice design model: qwen-voice-design
Voice-driven speech synthesis models (two types):
Qwen3-TTS-VD-Realtime (see Real-time speech synthesis - Qwen):
qwen3-tts-vd-realtime-2026-01-15
qwen3-tts-vd-realtime-2025-12-16
Qwen3-TTS-VD (see Speech synthesis - Qwen):
qwen3-tts-vd-2026-01-26
Preparations
Get an API key: Get an API key. For security, we recommend storing the API key in an environment variable.
Install the SDK: Make sure you have installed the latest DashScope SDK.
3. Sample code
Bidirectional streaming synthesis
Applies to Qwen3-TTS-VC-Realtime series models. Seea data-tag="xref" baseUrl="t3200114_v3_0_0.xdita" data-node="5876806" data-root="85177" href="t2996261.xdita#" id="c420d1956f50u">.
Create a custom voice and preview it. If satisfied, proceed. Otherwise, recreate.
Python
import requests import base64 import os def create_voice_and_play(): # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") if not api_key: print("Error: DASHSCOPE_API_KEY environment variable not found. Please set the API key first.") return None, None, None # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2026-01-15", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } } # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" try: # Send the request response = requests.post( url, headers=headers, json=data, timeout=60 # Add a timeout setting ) if response.status_code == 200: result = response.json() # Get the voice name voice_name = result["output"]["voice"] print(f"Voice name: {voice_name}") # Get the preview audio data base64_audio = result["output"]["preview_audio"]["data"] # Decode the Base64 audio data audio_bytes = base64.b64decode(base64_audio) # Save the audio file locally filename = f"{voice_name}_preview.wav" # Write the audio data to a local file with open(filename, 'wb') as f: f.write(audio_bytes) print(f"Audio saved to local file: {filename}") print(f"File path: {os.path.abspath(filename)}") return voice_name, audio_bytes, filename else: print(f"Request failed with status code: {response.status_code}") print(f"Response content: {response.text}") return None, None, None except requests.exceptions.RequestException as e: print(f"A network request error occurred: {e}") return None, None, None except KeyError as e: print(f"Response data format error, missing required field: {e}") print(f"Response content: {response.text if 'response' in locals() else 'No response'}") return None, None, None except Exception as e: print(f"An unknown error occurred: {e}") return None, None, None if __name__ == "__main__": print("Starting to create voice...") voice_name, audio_data, saved_filename = create_voice_and_play() if voice_name: print(f"\nSuccessfully created voice '{voice_name}'") print(f"Audio file saved as: '{saved_filename}'") print(f"File size: {os.path.getsize(saved_filename)} bytes") else: print("\nVoice creation failed")Java
Add the Gson dependency to your project:
Maven
Add the following to your
pom.xml:<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson --> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.13.1</version> </dependency>Gradle
Add the following to your
build.gradle:// https://mvnrepository.com/artifact/com.google.code.gson/gson implementation("com.google.code.gson:gson:2.13.1")import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import java.util.Base64; public class Main { public static void main(String[] args) { Main example = new Main(); example.createVoice(); } public void createVoice() { // API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"create\",\n" + " \"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\",\n" + " \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" + " \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" + " \"preferred_name\": \"announcer\",\n" + " \"language\": \"en\"\n" + " },\n" + " \"parameters\": {\n" + " \"sample_rate\": 24000,\n" + " \"response_format\": \"wav\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set the request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send the request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get the response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read the response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse the JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); JsonObject outputObj = jsonResponse.getAsJsonObject("output"); JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio"); // Get the voice name String voiceName = outputObj.get("voice").getAsString(); System.out.println("Voice name: " + voiceName); // Get the Base64-encoded audio data String base64Audio = previewAudioObj.get("data").getAsString(); // Decode the Base64 audio data byte[] audioBytes = Base64.getDecoder().decode(base64Audio); // Save the audio to a local file String filename = voiceName + "_preview.wav"; saveAudioToFile(audioBytes, filename); System.out.println("Audio saved to local file: " + filename); } else { // Read the error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request failed with status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("An error occurred during the request: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } private void saveAudioToFile(byte[] audioBytes, String filename) { try { File file = new File(filename); try (FileOutputStream fos = new FileOutputStream(file)) { fos.write(audioBytes); } System.out.println("Audio saved to: " + file.getAbsolutePath()); } catch (IOException e) { System.err.println("An error occurred while saving the audio file: " + e.getMessage()); e.printStackTrace(); } } }Use the custom voice created in the previous step for speech synthesis.
This example follows the "server commit mode" sample code for system voices in the DashScope SDK. Replace the
voiceparameter with the custom voice generated by voice design.Key principle: The model used for voice design (
target_model) must match the model used for subsequent speech synthesis (model). Otherwise, synthesis fails.Python
# coding=utf-8 # Installation instructions for pyaudio: # APPLE Mac OS X # brew install portaudio # pip install pyaudio # Debian/Ubuntu # sudo apt-get install python-pyaudio python3-pyaudio # or # pip install pyaudio # CentOS # sudo yum install -y portaudio portaudio-devel && pip install pyaudio # Microsoft Windows # python -m pip install pyaudio import pyaudio import os import base64 import threading import time import dashscope # DashScope Python SDK version must be 1.23.9 or later from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat # ======= Constant configuration ======= TEXT_TO_SYNTHESIZE = [ 'Right? I really like this kind of supermarket,', 'especially during the New Year.', 'Going to the supermarket', 'just makes me feel', 'super, super happy!', 'I want to buy so many things!' ] def init_dashscope_api_key(): """ Initialize the API key for the DashScope SDK. """ # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx" dashscope.api_key = os.getenv("DASHSCOPE_API_KEY") # ======= Callback class ======= class MyCallback(QwenTtsRealtimeCallback): """ Custom TTS streaming callback. """ def __init__(self): self.complete_event = threading.Event() self._player = pyaudio.PyAudio() self._stream = self._player.open( format=pyaudio.paInt16, channels=1, rate=24000, output=True ) def on_open(self) -> None: print('[TTS] Connection established') def on_close(self, close_status_code, close_msg) -> None: self._stream.stop_stream() self._stream.close() self._player.terminate() print(f'[TTS] Connection closed, code={close_status_code}, msg={close_msg}') def on_event(self, response: dict) -> None: try: event_type = response.get('type', '') if event_type == 'session.created': print(f'[TTS] Session started: {response["session"]["id"]}') elif event_type == 'response.audio.delta': audio_data = base64.b64decode(response['delta']) self._stream.write(audio_data) elif event_type == 'response.done': print(f'[TTS] Response complete, Response ID: {qwen_tts_realtime.get_last_response_id()}') elif event_type == 'session.finished': print('[TTS] Session finished') self.complete_event.set() except Exception as e: print(f'[Error] Exception processing callback event: {e}') def wait_for_finished(self): self.complete_event.wait() # ======= Main execution logic ======= if __name__ == '__main__': init_dashscope_api_key() print('[System] Initializing Qwen TTS Realtime ...') callback = MyCallback() qwen_tts_realtime = QwenTtsRealtime( # Use the same model for voice design and speech synthesis model="qwen3-tts-vd-realtime-2026-01-15", callback=callback, # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime' ) qwen_tts_realtime.connect() qwen_tts_realtime.update_session( voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design response_format=AudioFormat.PCM_24000HZ_MONO_16BIT, mode='server_commit' ) for text_chunk in TEXT_TO_SYNTHESIZE: print(f'[Sending text]: {text_chunk}') qwen_tts_realtime.append_text(text_chunk) time.sleep(0.1) qwen_tts_realtime.finish() callback.wait_for_finished() print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, ' f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')Java
import com.alibaba.dashscope.audio.qwen_tts_realtime.*; import com.alibaba.dashscope.exception.NoApiKeyException; import com.google.gson.JsonObject; import javax.sound.sampled.*; import java.io.*; import java.util.Base64; import java.util.Queue; import java.util.concurrent.CountDownLatch; import java.util.concurrent.atomic.AtomicReference; import java.util.concurrent.ConcurrentLinkedQueue; import java.util.concurrent.atomic.AtomicBoolean; public class Main { // ===== Constant definitions ===== private static String[] textToSynthesize = { "Right? I really like this kind of supermarket,", "especially during the New Year.", "Going to the supermarket", "just makes me feel", "super, super happy!", "I want to buy so many things!" }; // Real-time audio player class public static class RealtimePcmPlayer { private int sampleRate; private SourceDataLine line; private AudioFormat audioFormat; private Thread decoderThread; private Thread playerThread; private AtomicBoolean stopped = new AtomicBoolean(false); private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>(); private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>(); // Constructor initializes audio format and audio line public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException { this.sampleRate = sampleRate; this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false); DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat); line = (SourceDataLine) AudioSystem.getLine(info); line.open(audioFormat); line.start(); decoderThread = new Thread(new Runnable() { @Override public void run() { while (!stopped.get()) { String b64Audio = b64AudioBuffer.poll(); if (b64Audio != null) { byte[] rawAudio = Base64.getDecoder().decode(b64Audio); RawAudioBuffer.add(rawAudio); } else { try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); } } } } }); playerThread = new Thread(new Runnable() { @Override public void run() { while (!stopped.get()) { byte[] rawAudio = RawAudioBuffer.poll(); if (rawAudio != null) { try { playChunk(rawAudio); } catch (IOException e) { throw new RuntimeException(e); } catch (InterruptedException e) { throw new RuntimeException(e); } } else { try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); } } } } }); decoderThread.start(); playerThread.start(); } // Plays an audio chunk and blocks until playback is complete private void playChunk(byte[] chunk) throws IOException, InterruptedException { if (chunk == null || chunk.length == 0) return; int bytesWritten = 0; while (bytesWritten < chunk.length) { bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten); } int audioLength = chunk.length / (this.sampleRate*2/1000); // Wait for the audio in the buffer to finish playing Thread.sleep(audioLength - 10); } public void write(String b64Audio) { b64AudioBuffer.add(b64Audio); } public void cancel() { b64AudioBuffer.clear(); RawAudioBuffer.clear(); } public void waitForComplete() throws InterruptedException { while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) { Thread.sleep(100); } line.drain(); } public void shutdown() throws InterruptedException { stopped.set(true); decoderThread.join(); playerThread.join(); if (line != null && line.isRunning()) { line.drain(); line.close(); } } } public static void main(String[] args) throws Exception { QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder() // Use the same model for voice design and speech synthesis .model("qwen3-tts-vd-realtime-2026-01-15") // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime") // API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: .apikey("sk-xxx") .apikey(System.getenv("DASHSCOPE_API_KEY")) .build(); AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1)); final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null); // Create a real-time audio player instance RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000); QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() { @Override public void onOpen() { // Handling for when the connection is established } @Override public void onEvent(JsonObject message) { String type = message.get("type").getAsString(); switch(type) { case "session.created": // Handling for when the session is created break; case "response.audio.delta": String recvAudioB64 = message.get("delta").getAsString(); // Play audio in real time audioPlayer.write(recvAudioB64); break; case "response.done": // Handling for when the response is complete break; case "session.finished": // Handling for when the session is finished completeLatch.get().countDown(); default: break; } } @Override public void onClose(int code, String reason) { // Handling for when the connection is closed } }); qwenTtsRef.set(qwenTtsRealtime); try { qwenTtsRealtime.connect(); } catch (NoApiKeyException e) { throw new RuntimeException(e); } QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder() .voice("myvoice") // Replace the voice parameter with the custom voice generated by voice design .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT) .mode("server_commit") .build(); qwenTtsRealtime.updateSession(config); for (String text:textToSynthesize) { qwenTtsRealtime.appendText(text); Thread.sleep(100); } qwenTtsRealtime.finish(); completeLatch.get().await(); // Wait for audio playback to complete and shut down the player audioPlayer.waitForComplete(); audioPlayer.shutdown(); System.exit(0); } }
Non-streaming and unidirectional streaming synthesis
Applies to Qwen3-TTS-VC series models. Seea data-tag="xref" baseUrl="t3200114_v3_0_0.xdita" data-node="5585737" data-root="85177" href="t2884192.xdita#" id="4f641ff083awx">.
Create a custom voice and preview it. If satisfied, proceed. Otherwise, recreate.
Python
import requests import base64 import os def create_voice_and_play(): # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") if not api_key: print("Error: DASHSCOPE_API_KEY environment variable not found. Please set the API key first.") return None, None, None # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-2026-01-26", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } } # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" try: # Send the request response = requests.post( url, headers=headers, json=data, timeout=60 # Add a timeout setting ) if response.status_code == 200: result = response.json() # Get the voice name voice_name = result["output"]["voice"] print(f"Voice name: {voice_name}") # Get the preview audio data base64_audio = result["output"]["preview_audio"]["data"] # Decode the Base64 audio data audio_bytes = base64.b64decode(base64_audio) # Save the audio file locally filename = f"{voice_name}_preview.wav" # Write the audio data to a local file with open(filename, 'wb') as f: f.write(audio_bytes) print(f"Audio saved to local file: {filename}") print(f"File path: {os.path.abspath(filename)}") return voice_name, audio_bytes, filename else: print(f"Request failed with status code: {response.status_code}") print(f"Response content: {response.text}") return None, None, None except requests.exceptions.RequestException as e: print(f"A network request error occurred: {e}") return None, None, None except KeyError as e: print(f"Response data format error, missing required field: {e}") print(f"Response content: {response.text if 'response' in locals() else 'No response'}") return None, None, None except Exception as e: print(f"An unknown error occurred: {e}") return None, None, None if __name__ == "__main__": print("Starting to create voice...") voice_name, audio_data, saved_filename = create_voice_and_play() if voice_name: print(f"\nSuccessfully created voice '{voice_name}'") print(f"Audio file saved as: '{saved_filename}'") print(f"File size: {os.path.getsize(saved_filename)} bytes") else: print("\nVoice creation failed")Java
Add the Gson dependency to your project:
Maven
Add the following to your
pom.xml:<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson --> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.13.1</version> </dependency>Gradle
Add the following to your
build.gradle:// https://mvnrepository.com/artifact/com.google.code.gson/gson implementation("com.google.code.gson:gson:2.13.1")ImportantTo use a custom voice generated by voice design for speech synthesis, configure the voice as follows:
MultiModalConversationParam param = MultiModalConversationParam.builder() .parameter("voice", "your_voice") // Replace the voice parameter with the custom voice generated by voice design .build();import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import java.util.Base64; public class Main { public static void main(String[] args) { Main example = new Main(); example.createVoice(); } public void createVoice() { // API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"create\",\n" + " \"target_model\": \"qwen3-tts-vd-2026-01-26\",\n" + " \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" + " \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" + " \"preferred_name\": \"announcer\",\n" + " \"language\": \"en\"\n" + " },\n" + " \"parameters\": {\n" + " \"sample_rate\": 24000,\n" + " \"response_format\": \"wav\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set the request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send the request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get the response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read the response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse the JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); JsonObject outputObj = jsonResponse.getAsJsonObject("output"); JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio"); // Get the voice name String voiceName = outputObj.get("voice").getAsString(); System.out.println("Voice name: " + voiceName); // Get the Base64-encoded audio data String base64Audio = previewAudioObj.get("data").getAsString(); // Decode the Base64 audio data byte[] audioBytes = Base64.getDecoder().decode(base64Audio); // Save the audio to a local file String filename = voiceName + "_preview.wav"; saveAudioToFile(audioBytes, filename); System.out.println("Audio saved to local file: " + filename); } else { // Read the error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request failed with status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("An error occurred during the request: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } private void saveAudioToFile(byte[] audioBytes, String filename) { try { File file = new File(filename); try (FileOutputStream fos = new FileOutputStream(file)) { fos.write(audioBytes); } System.out.println("Audio saved to: " + file.getAbsolutePath()); } catch (IOException e) { System.err.println("An error occurred while saving the audio file: " + e.getMessage()); e.printStackTrace(); } } }Use the custom voice created in the previous step for non-streaming speech synthesis.
This example follows the "non-streaming output" sample code for system voices in the DashScope SDK. Replace the
voiceparameter with the custom voice generated by voice design. For unidirectional streaming synthesis, see Speech synthesis - Qwen.Key principle: The model used for voice design (
target_model) must match the model used for subsequent speech synthesis (model). Otherwise, synthesis fails.Python
import os import dashscope if __name__ == '__main__': # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1 dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1' text = "What's the weather like today?" # How to use SpeechSynthesizer: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...) response = dashscope.MultiModalConversation.call( model="qwen3-tts-vd-2026-01-26", # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key=os.getenv("DASHSCOPE_API_KEY"), text=text, voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design stream=False ) print(response)Java
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation; import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam; import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult; import com.alibaba.dashscope.exception.ApiException; import com.alibaba.dashscope.exception.NoApiKeyException; import com.alibaba.dashscope.exception.UploadFileException; import com.alibaba.dashscope.utils.Constants; import java.io.FileOutputStream; import java.io.InputStream; import java.net.URL; public class Main { private static final String MODEL = "qwen3-tts-vd-2026-01-26"; public static void call() throws ApiException, NoApiKeyException, UploadFileException { MultiModalConversation conv = new MultiModalConversation(); MultiModalConversationParam param = MultiModalConversationParam.builder() // API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: .apiKey("sk-xxx") .apiKey(System.getenv("DASHSCOPE_API_KEY")) .model(MODEL) .text("Today is a wonderful day to build something people love!") .parameter("voice", "myvoice") // Replace the voice parameter with the custom voice generated by voice design .build(); MultiModalConversationResult result = conv.call(param); String audioUrl = result.getOutput().getAudio().getUrl(); System.out.print(audioUrl); // Download the audio file locally try (InputStream in = new URL(audioUrl).openStream(); FileOutputStream out = new FileOutputStream("downloaded_audio.wav")) { byte[] buffer = new byte[1024]; int bytesRead; while ((bytesRead = in.read(buffer)) != -1) { out.write(buffer, 0, bytesRead); } System.out.println("\nAudio file downloaded locally: downloaded_audio.wav"); } catch (Exception e) { System.out.println("\nError downloading audio file: " + e.getMessage()); } } public static void main(String[] args) { try { // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1 Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1"; call(); } catch (ApiException | NoApiKeyException | UploadFileException e) { System.out.println(e.getMessage()); } System.exit(0); } }
API reference
Use the same account for all API operations.
Create voice
Submit a voice description and preview text to create a custom voice.
URL
Chinese mainland:
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational:
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customizationRequest headers
Parameter
Type
Required
Description
Authorization
string
Authentication token, formatted as
Bearer <your_api_key>. Replace<your_api_key>with your actual API key.Content-Type
string
Media type of data transmitted in the request body. Fixed value:
application/json.Request body
The request body includes all parameters. Optional fields can be omitted based on your needs.
ImportantDistinguish the following parameters:
model: Voice design model. Fixed value: qwen-voice-design.target_model: Speech synthesis model driving this voice. Must match the speech synthesis model used in subsequent calls, or synthesis fails.
{ "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2026-01-15", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "zh" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } }Request parameters
Parameter
Type
Default
Required
Description
model
string
-
Voice design model. Fixed value:
qwen-voice-design.action
string
-
Action type. Fixed value:
create.target_model
string
-
Speech synthesis model driving this voice. Supported models (two types):
Qwen3-TTS-VD-Realtime (see Real-time speech synthesis - Qwen):
qwen3-tts-vd-realtime-2026-01-15
qwen3-tts-vd-realtime-2025-12-16
Qwen3-TTS-VD (see Speech synthesis - Qwen):
qwen3-tts-vd-2026-01-26
Must match the speech synthesis model used in subsequent calls, or synthesis fails.
voice_prompt
string
-
Voice description. Maximum length: 2,048 characters.
Supports Chinese and English only.
For guidance on writing voice descriptions, see "How to write high-quality voice descriptions?".
preview_text
string
-
Text for the preview audio. Maximum length: 1,024 characters.
Supported languages: Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru).
preferred_name
string
-
Name to identify the voice (alphanumeric characters and underscores only, up to 16 characters). Choose a name related to the role or scenario.
This keyword appears in the generated voice name. Example: keyword "announcer" → voice name "qwen-tts-vd-announcer-voice-20251201102800-a1b2".
language
string
zh
Language code specifying the language preference for the generated voice. This affects language-specific features and pronunciation tendencies. Select the appropriate language code for your use case.
If specified, this language must match the
preview_textlanguage.Valid values:
zh(Chinese),en(English),de(German),it(Italian),pt(Portuguese),es(Spanish),ja(Japanese),ko(Korean),fr(French),ru(Russian).sample_rate
int
24000
Sample rate (Hz) for the preview audio generated by voice design.
Valid values:
8000
16000
24000
48000
response_format
string
wav
Audio format for the preview audio generated by voice design.
Valid values:
pcm
wav
mp3
opus
Response parameters
Key parameters:
Parameter
Type
Description
voice
string
Voice name. Use directly as the
voiceparameter in speech synthesis APIs.data
string
Preview audio data generated by voice design, returned as a Base64-encoded string.
sample_rate
int
Sample rate (Hz) for the preview audio generated by voice design. Matches the sample rate used when creating the voice. Default is 24000 Hz if unspecified.
response_format
string
Audio format for the preview audio generated by voice design. Matches the format used when creating the voice. Default is wav if unspecified.
target_model
string
Speech synthesis model driving this voice. Supported models (two types):
Qwen3-TTS-VD-Realtime (see Real-time speech synthesis - Qwen):
qwen3-tts-vd-realtime-2026-01-15
qwen3-tts-vd-realtime-2025-12-16
Qwen3-TTS-VD (see Speech synthesis - Qwen):
qwen3-tts-vd-2026-01-26
Must match the speech synthesis model used in subsequent calls, or synthesis fails.
request_id
string
Request ID.
count
integer
This request incurs a charge for the number of “Create Voice” operations actually performed. The cost for this request is $
. For voice creation, count is always 1.
Sample code
ImportantDistinguish the following parameters:
model: Voice design model. Fixed value: qwen-voice-design.target_model: Speech synthesis model driving this voice. Must match the speech synthesis model used in subsequent calls, or synthesis fails.
cURL
If you have not configured the API key in an environment variable, replace
$DASHSCOPE_API_KEYin the example with your actual API key.https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization# ======= Important note ======= # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # === Delete this comment before execution === curl -X POST <a data-init-id="9f104f338c7kz" href="https://poc-dashscope.aliyuncs.com/api/v1/services/audio/tts/customization" id="28f184e9f7vq7">https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization</a> \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2026-01-15", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "zh" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } }'Python
import requests import base64 import os def create_voice_and_play(): # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") if not api_key: print("Error: DASHSCOPE_API_KEY environment variable not found. Please set the API key first.") return None, None, None # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2026-01-15", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } } # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" try: # Send the request response = requests.post( url, headers=headers, json=data, timeout=60 # Add a timeout setting ) if response.status_code == 200: result = response.json() # Get the voice name voice_name = result["output"]["voice"] print(f"Voice name: {voice_name}") # Get the preview audio data base64_audio = result["output"]["preview_audio"]["data"] # Decode the Base64 audio data audio_bytes = base64.b64decode(base64_audio) # Save the audio file locally filename = f"{voice_name}_preview.wav" # Write the audio data to a local file with open(filename, 'wb') as f: f.write(audio_bytes) print(f"Audio saved to local file: {filename}") print(f"File path: {os.path.abspath(filename)}") return voice_name, audio_bytes, filename else: print(f"Request failed with status code: {response.status_code}") print(f"Response content: {response.text}") return None, None, None except requests.exceptions.RequestException as e: print(f"A network request error occurred: {e}") return None, None, None except KeyError as e: print(f"Response data format error, missing required field: {e}") print(f"Response content: {response.text if 'response' in locals() else 'No response'}") return None, None, None except Exception as e: print(f"An unknown error occurred: {e}") return None, None, None if __name__ == "__main__": print("Starting to create voice...") voice_name, audio_data, saved_filename = create_voice_and_play() if voice_name: print(f"\nSuccessfully created voice '{voice_name}'") print(f"Audio file saved as: '{saved_filename}'") print(f"File size: {os.path.getsize(saved_filename)} bytes") else: print("\nVoice creation failed")Java
import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import java.util.Base64; public class Main { public static void main(String[] args) { Main example = new Main(); example.createVoice(); } public void createVoice() { // API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"create\",\n" + " \"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\",\n" + " \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" + " \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" + " \"preferred_name\": \"announcer\",\n" + " \"language\": \"en\"\n" + " },\n" + " \"parameters\": {\n" + " \"sample_rate\": 24000,\n" + " \"response_format\": \"wav\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set the request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send the request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get the response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read the response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse the JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); JsonObject outputObj = jsonResponse.getAsJsonObject("output"); JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio"); // Get the voice name String voiceName = outputObj.get("voice").getAsString(); System.out.println("Voice name: " + voiceName); // Get the Base64-encoded audio data String base64Audio = previewAudioObj.get("data").getAsString(); // Decode the Base64 audio data byte[] audioBytes = Base64.getDecoder().decode(base64Audio); // Save the audio to a local file String filename = voiceName + "_preview.wav"; saveAudioToFile(audioBytes, filename); System.out.println("Audio saved to local file: " + filename); } else { // Read the error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request failed with status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("An error occurred during the request: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } private void saveAudioToFile(byte[] audioBytes, String filename) { try { File file = new File(filename); try (FileOutputStream fos = new FileOutputStream(file)) { fos.write(audioBytes); } System.out.println("Audio saved to: " + file.getAbsolutePath()); } catch (IOException e) { System.err.println("An error occurred while saving the audio file: " + e.getMessage()); e.printStackTrace(); } } }
List voices
Returns a paginated list of all voices created under your account.
URL
Chinese mainland:
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational:
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customizationRequest headers
Parameter
Type
Required
Description
Authorization
string
Authentication token, formatted as
Bearer <your_api_key>. Replace<your_api_key>with your actual API key.Content-Type
string
Media type of data transmitted in the request body. Fixed value:
application/json.Request body
The request body contains all parameters. Omit optional fields as needed.
Importantmodel: Voice design model. Fixed toqwen-voice-design. Do not change this value.{ "model": "qwen-voice-design", "input": { "action": "list", "page_size": 10, "page_index": 0 } }Request parameters
Parameter
Type
Default
Required
Description
model
string
--
Voice design model. Fixed value:
qwen-voice-design.action
string
--
Action type. Fixed to
list.page_index
integer
0
Page number. Range: 0–200.
page_size
integer
10
Entries per page. Must be greater than 0.
Response parameters
Key parameters:
Parameter
Type
Description
voice
string
Voice name. Use directly as the
voiceparameter in speech synthesis APIs.target_model
string
Speech synthesis model driving this voice. Supported models (two types):
Qwen3-TTS-VD-Realtime (see Real-time speech synthesis - Qwen):
qwen3-tts-vd-realtime-2026-01-15
qwen3-tts-vd-realtime-2025-12-16
Qwen3-TTS-VD (see Speech synthesis - Qwen):
qwen3-tts-vd-2026-01-26
Must match the speech synthesis model used in subsequent calls, or synthesis fails.
language
string
Language code.
Valid values:
zh(Chinese),en(English),de(German),it(Italian),pt(Portuguese),es(Spanish),ja(Japanese),ko(Korean),fr(French),ru(Russian).voice_prompt
string
Voice description.
preview_text
string
Preview text.
gmt_create
string
The voice's creation time.
gmt_modified
string
The time when the voice was last modified.
page_index
integer
Page number.
page_size
integer
Entries per page.
total_count
integer
The total number of records that the query returns.
request_id
string
Request ID.
Sample code
Importantmodel: Voice design model. Fixed toqwen-voice-design. Do not change this value.cURL
If you have not set the API key as an environment variable, you must replace
$DASHSCOPE_API_KEYin the example with your actual API key.# ======= Important notice ======= # This URL is for the Singapore region. If you use the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys differ between the Singapore and China (Beijing) regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # === Remove this comment before running === curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-voice-design", "input": { "action": "list", "page_size": 10, "page_index": 0 } }'Python
import os import requests # API keys differ between the Singapore and China (Beijing) regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you have not set an environment variable, replace the next line with: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # This URL is for the Singapore region. If you use the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" payload = { "model": "qwen-voice-design", # Do not change this value "input": { "action": "list", "page_size": 10, "page_index": 0 } } headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } response = requests.post(url, json=payload, headers=headers) print("HTTP status code:", response.status_code) if response.status_code == 200: data = response.json() voice_list = data["output"]["voice_list"] print("List of voices:") for item in voice_list: print(f"- Voice: {item['voice']} Created: {item['gmt_create']} Model: {item['target_model']}") else: print("Request failed:", response.text)Java
import com.google.gson.Gson; import com.google.gson.JsonArray; import com.google.gson.JsonObject; import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; public class Main { public static void main(String[] args) { // API keys differ between the Singapore and China (Beijing) regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you have not set an environment variable, replace the next line with: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // This URL is for the Singapore region. If you use the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"; // JSON request body (older Java versions do not support """ multi-line strings) String jsonPayload = "{" + "\"model\": \"qwen-voice-design\"," // Do not change this value + "\"input\": {" + "\"action\": \"list\"," + "\"page_size\": 10," + "\"page_index\": 0" + "}" + "}"; try { HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection(); con.setRequestMethod("POST"); con.setRequestProperty("Authorization", "Bearer " + apiKey); con.setRequestProperty("Content-Type", "application/json"); con.setDoOutput(true); try (OutputStream os = con.getOutputStream()) { os.write(jsonPayload.getBytes("UTF-8")); } int status = con.getResponseCode(); BufferedReader br = new BufferedReader(new InputStreamReader( status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8")); StringBuilder response = new StringBuilder(); String line; while ((line = br.readLine()) != null) { response.append(line); } br.close(); System.out.println("HTTP status code: " + status); System.out.println("Response JSON: " + response.toString()); if (status == 200) { Gson gson = new Gson(); JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class); JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list"); System.out.println("\nList of voices:"); for (int i = 0; i < voiceList.size(); i++) { JsonObject voiceItem = voiceList.get(i).getAsJsonObject(); String voice = voiceItem.get("voice").getAsString(); String gmtCreate = voiceItem.get("gmt_create").getAsString(); String targetModel = voiceItem.get("target_model").getAsString(); System.out.printf("- Voice: %s Created: %s Model: %s\n", voice, gmtCreate, targetModel); } } } catch (Exception e) { e.printStackTrace(); } } }
Query a specific voice
Get detailed information about a specific voice by its name.
URL
Chinese mainland:
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational:
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customizationRequest headers
Parameter
Type
Required
Description
Authorization
string
Authentication token, formatted as
Bearer <your_api_key>. Replace<your_api_key>with your actual API key.Content-Type
string
Media type of data transmitted in the request body. Fixed value:
application/json.Request body
The request body contains all request parameters. You can omit optional fields as needed.
Importantmodel: Voice design model. Fixed toqwen-voice-design. Do not change this value.{ "model": "qwen-voice-design", "input": { "action": "query", "voice": "voiceName" } }Request parameters
Parameter
Type
Default
Required
Description
model
string
-
Voice design model. Fixed value:
qwen-voice-design.action
string
-
Action type. Fixed value:
query.voice
string
-
The name of the voice to query.
Response parameters
Key parameters:
Parameter
Type
Description
voice
string
Voice name. Use directly as the
voiceparameter in speech synthesis APIs.target_model
string
Speech synthesis model driving this voice. Supported models (two types):
Qwen3-TTS-VD-Realtime (see Real-time speech synthesis - Qwen):
qwen3-tts-vd-realtime-2026-01-15
qwen3-tts-vd-realtime-2025-12-16
Qwen3-TTS-VD (see Speech synthesis - Qwen):
qwen3-tts-vd-2026-01-26
Must match the speech synthesis model used in subsequent calls, or synthesis fails.
language
string
Language code.
Valid values:
zh(Chinese),en(English),de(German),it(Italian),pt(Portuguese),es(Spanish),ja(Japanese),ko(Korean),fr(French),ru(Russian).voice_prompt
string
The voice description.
preview_text
string
The preview text.
gmt_create
string
The time when the voice was created.
gmt_modified
string
The time when the voice was last modified.
request_id
string
The request ID.
Code examples
Importantmodel: Voice design model. Fixed toqwen-voice-design. Do not change this value.cURL
If you have not set the API key as an environment variable, you must replace
$DASHSCOPE_API_KEYin the example with your actual API key.# ======= Important ======= # The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key # === Delete this comment before running the command. === curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-voice-design", "input": { "action": "query", "voice": "voiceName" } }'Python
import requests import os def query_voice(voice_name): """ Queries information about a specific voice. :param voice_name: The name of the voice. :return: A dictionary that contains the voice information, or None if the voice is not found. """ # The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key # If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # Prepare the request data. headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "query", "voice": voice_name } } # The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" # Send the request. response = requests.post( url, headers=headers, json=data ) if response.status_code == 200: result = response.json() # Check for error messages. if "code" in result and result["code"] == "VoiceNotFound": print(f"Voice not found: {voice_name}") print(f"Error message: {result.get('message', 'Voice not found')}") return None # Get the voice information. voice_info = result["output"] print(f"Successfully queried voice information:") print(f" Voice name: {voice_info.get('voice')}") print(f" Creation time: {voice_info.get('gmt_create')}") print(f" Modification time: {voice_info.get('gmt_modified')}") print(f" Language: {voice_info.get('language')}") print(f" Preview text: {voice_info.get('preview_text')}") print(f" Model: {voice_info.get('target_model')}") print(f" Voice description: {voice_info.get('voice_prompt')}") return voice_info else: print(f"Request failed, status code: {response.status_code}") print(f"Response content: {response.text}") return None def main(): # Example: Query a voice. voice_name = "myvoice" # Replace with the actual name of the voice you want to query. print(f"Querying voice: {voice_name}") voice_info = query_voice(voice_name) if voice_info: print("\nVoice queried successfully!") else: print("\nFailed to query the voice or the voice does not exist.") if __name__ == "__main__": main()Java
import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; public class Main { public static void main(String[] args) { Main example = new Main(); // Example: Query a voice. String voiceName = "myvoice"; // Replace with the actual name of the voice you want to query. System.out.println("Querying voice: " + voiceName); example.queryVoice(voiceName); } public void queryVoice(String voiceName) { // The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key // If you have not configured the environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string. String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"query\",\n" + " \"voice\": \"" + voiceName + "\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set the request method and headers. connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send the request body. try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get the response. int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read the response content. StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse the JSON response. JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); // Check for error messages. if (jsonResponse.has("code") && "VoiceNotFound".equals(jsonResponse.get("code").getAsString())) { String errorMessage = jsonResponse.has("message") ? jsonResponse.get("message").getAsString() : "Voice not found"; System.out.println("Voice not found: " + voiceName); System.out.println("Error message: " + errorMessage); return; } // Get the voice information. JsonObject outputObj = jsonResponse.getAsJsonObject("output"); System.out.println("Successfully queried voice information:"); System.out.println(" Voice name: " + outputObj.get("voice").getAsString()); System.out.println(" Creation time: " + outputObj.get("gmt_create").getAsString()); System.out.println(" Modification time: " + outputObj.get("gmt_modified").getAsString()); System.out.println(" Language: " + outputObj.get("language").getAsString()); System.out.println(" Preview text: " + outputObj.get("preview_text").getAsString()); System.out.println(" Model: " + outputObj.get("target_model").getAsString()); System.out.println(" Voice description: " + outputObj.get("voice_prompt").getAsString()); } else { // Read the error response. StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request failed, status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("An error occurred during the request: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } }
Delete a voice
Deletes a specified voice and releases the corresponding quota.
URL
Chinese mainland:
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational:
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customizationRequest headers
Parameter
Type
Required
Description
Authorization
string
Authentication token, formatted as
Bearer <your_api_key>. Replace<your_api_key>with your actual API key.Content-Type
string
Media type of data transmitted in the request body. Fixed value:
application/json.Request body
The request body includes all parameters. Optional fields can be omitted.
Importantmodel: Voice design model. Fixed toqwen-voice-design. Do not change this value.{ "model": "qwen-voice-design", "input": { "action": "delete", "voice": "yourVoice" } }Request parameters
Parameter
Type
Default
Required
Description
model
string
-
Voice design model. Fixed value:
qwen-voice-design.action
string
-
Action type. Fixed value:
delete.voice
string
-
The voice to delete.
Response parameters
Key parameters:
Parameter
Type
Description
request_id
string
The request ID.
voice
string
The deleted voice.
Sample code
Importantmodel: Voice design model. Fixed toqwen-voice-design. Do not change this value.cURL
If you have not set the API key as an environment variable, you must replace
$DASHSCOPE_API_KEYin the example with your actual API key.# ======= Important ======= # The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key # === Delete this comment before you run the command === curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-voice-design", "input": { "action": "delete", "voice": "yourVoice" } }'Python
import requests import os def delete_voice(voice_name): """ Deletes a specified voice. :param voice_name: The name of the voice. :return: True if the voice is deleted or does not exist but the request is successful. False if the operation fails. """ # The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # Prepare the request data. headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "delete", "voice": voice_name } } # The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" # Send the request. response = requests.post( url, headers=headers, json=data ) if response.status_code == 200: result = response.json() # Check for an error message. if "code" in result and "VoiceNotFound" in result["code"]: print(f"Voice does not exist: {voice_name}") print(f"Error message: {result.get('message', 'Voice not found')}") return True # The operation is considered successful if the voice does not exist because the target is already gone. # Check if the deletion was successful. if "usage" in result: print(f"Voice deleted successfully: {voice_name}") print(f"Request ID: {result.get('request_id', 'N/A')}") return True else: print(f"The deletion operation returned an unexpected format: {result}") return False else: print(f"Failed to delete the voice. Status code: {response.status_code}") print(f"Response content: {response.text}") return False def main(): # Example: Delete a voice. voice_name = "myvoice" # Replace with the actual name of the voice that you want to delete. print(f"Deleting voice: {voice_name}") success = delete_voice(voice_name) if success: print(f"\nDeletion of voice '{voice_name}' is complete!") else: print(f"\nFailed to delete voice '{voice_name}'!") if __name__ == "__main__": main()Java
import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; public class Main { public static void main(String[] args) { Main example = new Main(); // Example: Delete a voice. String voiceName = "myvoice"; // Replace with the actual name of the voice that you want to delete. System.out.println("Deleting voice: " + voiceName); example.deleteVoice(voiceName); } public void deleteVoice(String voiceName) { // The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you have not configured an environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string. String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"delete\",\n" + " \"voice\": \"" + voiceName + "\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set the request method and headers. connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send the request body. try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get the response. int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read the response content. StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse the JSON response. JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); // Check for an error message. if (jsonResponse.has("code") && jsonResponse.get("code").getAsString().contains("VoiceNotFound")) { String errorMessage = jsonResponse.has("message") ? jsonResponse.get("message").getAsString() : "Voice not found"; System.out.println("Voice does not exist: " + voiceName); System.out.println("Error message: " + errorMessage); // The operation is considered successful if the voice does not exist because the target is already gone. } else if (jsonResponse.has("usage")) { // Check if the deletion was successful. System.out.println("Voice deleted successfully: " + voiceName); String requestId = jsonResponse.has("request_id") ? jsonResponse.get("request_id").getAsString() : "N/A"; System.out.println("Request ID: " + requestId); } else { System.out.println("The deletion operation returned an unexpected format: " + response.toString()); } } else { // Read the error response. StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Failed to delete the voice. Status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("An error occurred during the request: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } }
Speech synthesis
To synthesize audio with a custom voice generated by voice design, see Getting started: From voice design to speech synthesis.
The speech synthesis model for voice design—such as qwen3-tts-vd-realtime-2026-01-15—is a dedicated model. It supports only voices generated by voice design. It does not support system voices such as Chelsie, Serena, Ethan, or Cherry.
Voice quota and automatic cleanup rules
Quota limit: 1,000 voices per account.
You can check the count via the
total_countfield in the List voicesAutomatic cleanup: Voices unused for over one year are automatically deleted.
Billing
Voice design and speech synthesis are billed separately.
Voice design: Create a voice style is billed at USD 0.2 per voice. Creation failures are not billed.
NoteFree quota details (available only in the and Singapore regions):
10 free voice creations within 90 days after activating Alibaba Cloud Model Studio.
Failed creations do not consume free quota.
Deleting a voice does not restore free quota.
After the free quota is used up or the 90-day validity period expires, voice creation is billed at $0.2 per voice.
Speech synthesis using custom voices: Billed per character. For pricing details, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.
Error messages
If you encounter errors, see Error messages for troubleshooting.