All Products
Search
Document Center

Alibaba Cloud Model Studio:Qwen voice design API reference

Last Updated:Mar 31, 2026

Voice design creates custom voices from text descriptions. It supports multilingual and multidimensional voice feature definitions. Voice design and speech synthesis are two sequential steps. This document covers voice design parameters and API details. For speech synthesis, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.

User guide: For model introductions and selection recommendations, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.

Language support

Voice design supports multilingual voice creation and speech synthesis for the following languages: Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru).

How to write high-quality voice descriptions?

Requirements and limitations

When writing a voice description (voice_prompt), follow these technical constraints:

  • Length limit: The voice_prompt content must not exceed 2,048 characters.

  • Supported languages: Description text supports Chinese and English only.

Core principles

A high-quality voice description (voice_prompt) is key to creating your ideal voice. Think of it as the "blueprint" for voice design—it guides the model to generate voices with specific features.

Follow these core principles when describing a voice:

  1. Be specific, not vague: Use words that describe concrete voice qualities, such as "deep," "crisp," or "fast-paced." Avoid subjective, low-information terms like "nice" or "normal."

  2. Be multidimensional, not single-dimensional: Strong descriptions combine multiple dimensions (e.g., gender, age, emotion). A single-dimension description like "female voice" is too broad to produce a distinctive voice.

  3. Be objective, not subjective: Focus on physical and perceptual voice features—not personal preferences. For example, use "high-pitched and energetic" instead of "my favorite voice."

  4. Be original, not imitative: Describe voice qualities—not requests to mimic specific people (e.g., celebrities or actors). Such requests carry copyright risk and are not supported by the model.

  5. Be concise, not redundant: Ensure every word adds meaning. Avoid repeating synonyms or meaningless intensifiers (e.g., "a very, very great voice").

Reference dimensions for descriptions

Dimension

Examples

Gender

Male, female, neutral

Age

Child (5–12), teenager (13–18), young adult (19–35), middle-aged (36–55), elderly (55+)

Pitch

High, medium, low, high-pitched, low-pitched

Pace

Fast, medium, slow, fast-paced, slow-paced

Emotion

Cheerful, calm, gentle, serious, lively, composed, soothing

Characteristics

Magnetic, crisp, hoarse, mellow, sweet, rich, powerful

Purpose

News broadcast, ad voice-over, audiobook, animation character, voice assistant, documentary narration

Example comparison

✅ Good cases

  • "A young, lively female voice with a fast pace and noticeable upward intonation, suitable for fashion product introductions." Combines age, personality, pace, intonation, and use case.

    Analysis: Combines age, personality, pace, and intonation—and specifies the use case—creating a vivid, three-dimensional voice.

  • "A calm, middle-aged male voice with a slow pace and deep, magnetic tone, suitable for news or documentary narration." Defines gender, age range, pace, vocal characteristics, and application domain.

    Analysis: Clearly defines gender, age group, pace, vocal traits, and application area.

  • "A cute child's voice, around 8 years old, with a slightly childish tone, suitable for animation character voice-overs." Specifies exact age and vocal quality ("childish") with clear purpose.

    Analysis: Precisely identifies age and vocal trait ("childish"), with a well-defined goal.

  • A gentle, intellectual woman in her early 30s with a calm voice, ideal for audiobook narration.

    Analysis: Words like "intellectual" and "calm" clearly communicate emotional tone and stylistic intent.

❌ Bad cases and improvement suggestions

Bad cases

Main issue

Improvement suggestion

A nice voice

Too vague and subjective, lacking actionable features.

Add specific dimensions, e.g., "A young female voice with a clear vocal line and gentle tone."

A voice like a certain celebrity

Involves copyright risk; the model cannot directly mimic celebrities.

Extract and describe voice traits, e.g., "A mature, magnetic male voice with a calm pace."

A very, very, very nice female voice

Information redundancy; repetition does not help define voice quality.

Remove repeated words and add effective descriptors, e.g., "A female voice aged 20–24, with a light tone, lively pitch, and sweet quality."

123456

Invalid input; cannot be parsed into voice features.

Provide a meaningful text description using the recommended examples above.

Getting started: From voice design to speech synthesis

image

1. Workflow

Voice design and speech synthesis are two sequential steps. Follow a create-then-use workflow:

  1. Prepare the voice description and preview text for voice design.

    • Voice description (voice_prompt): Defines the target voice’s features (for how to write one, see "How to write high-quality voice descriptions?").

    • Preview text (preview_text): Text for the preview audio generated by the target voice (e.g., "Hello everyone, welcome to listen.").

  2. Call the Create voice API to create a custom voice and get its name and preview audio.

    You must set target_model to the speech synthesis model that drives this voice.

    Listen to the preview audio to check if it meets expectations. If satisfied, proceed to the next step. Otherwise, redesign.

    If you already have a created voice (check via the List voices API), skip this step and go straight to the next.

  3. Use the voice for speech synthesis.

    Call the speech synthesis API and pass in the voice name obtained in the previous step. The speech synthesis model used here must match the target_model from the previous step.

2. Model configuration and preparations

Select appropriate models and complete preparations.

Model configuration

Specify the following two models for voice design:

Preparations

  1. Get an API key: Get an API key. For security, we recommend storing the API key in an environment variable.

  2. Install the SDK: Make sure you have installed the latest DashScope SDK.

3. Sample code

Bidirectional streaming synthesis

Applies to Qwen3-TTS-VC-Realtime series models. Seea data-tag="xref" baseUrl="t3200114_v3_0_0.xdita" data-node="5876806" data-root="85177" href="t2996261.xdita#" id="c420d1956f50u">.

  1. Create a custom voice and preview it. If satisfied, proceed. Otherwise, recreate.

    Python

    import requests
    import base64
    import os
    
    def create_voice_and_play():
        # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx"
        api_key = os.getenv("DASHSCOPE_API_KEY")
        
        if not api_key:
            print("Error: DASHSCOPE_API_KEY environment variable not found. Please set the API key first.")
            return None, None, None
        
        # Prepare request data
        headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }
        
        data = {
            "model": "qwen-voice-design",
            "input": {
                "action": "create",
                "target_model": "qwen3-tts-vd-realtime-2026-01-15",
                "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
                "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
                "preferred_name": "announcer",
                "language": "en"
            },
            "parameters": {
                "sample_rate": 24000,
                "response_format": "wav"
            }
        }
        
        # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
        
        try:
            # Send the request
            response = requests.post(
                url,
                headers=headers,
                json=data,
                timeout=60  # Add a timeout setting
            )
            
            if response.status_code == 200:
                result = response.json()
                
                # Get the voice name
                voice_name = result["output"]["voice"]
                print(f"Voice name: {voice_name}")
                
                # Get the preview audio data
                base64_audio = result["output"]["preview_audio"]["data"]
                
                # Decode the Base64 audio data
                audio_bytes = base64.b64decode(base64_audio)
                
                # Save the audio file locally
                filename = f"{voice_name}_preview.wav"
                
                # Write the audio data to a local file
                with open(filename, 'wb') as f:
                    f.write(audio_bytes)
                
                print(f"Audio saved to local file: {filename}")
                print(f"File path: {os.path.abspath(filename)}")
                
                return voice_name, audio_bytes, filename
            else:
                print(f"Request failed with status code: {response.status_code}")
                print(f"Response content: {response.text}")
                return None, None, None
                
        except requests.exceptions.RequestException as e:
            print(f"A network request error occurred: {e}")
            return None, None, None
        except KeyError as e:
            print(f"Response data format error, missing required field: {e}")
            print(f"Response content: {response.text if 'response' in locals() else 'No response'}")
            return None, None, None
        except Exception as e:
            print(f"An unknown error occurred: {e}")
            return None, None, None
    
    if __name__ == "__main__":
        print("Starting to create voice...")
        voice_name, audio_data, saved_filename = create_voice_and_play()
        
        if voice_name:
            print(f"\nSuccessfully created voice '{voice_name}'")
            print(f"Audio file saved as: '{saved_filename}'")
            print(f"File size: {os.path.getsize(saved_filename)} bytes")
        else:
            print("\nVoice creation failed")

    Java

    Add the Gson dependency to your project:

    Maven

    Add the following to your pom.xml:

    <!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
    <dependency>
        <groupId>com.google.code.gson</groupId>
        <artifactId>gson</artifactId>
        <version>2.13.1</version>
    </dependency>

    Gradle

    Add the following to your build.gradle:

    // https://mvnrepository.com/artifact/com.google.code.gson/gson
    implementation("com.google.code.gson:gson:2.13.1")
    import com.google.gson.JsonObject;
    import com.google.gson.JsonParser;
    import java.io.*;
    import java.net.HttpURLConnection;
    import java.net.URL;
    import java.util.Base64;
    
    public class Main {
        public static void main(String[] args) {
            Main example = new Main();
            example.createVoice();
        }
    
        public void createVoice() {
            // API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If the environment variable is not set, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
    
            // Create the JSON request body string
            String jsonBody = "{\n" +
                    "    \"model\": \"qwen-voice-design\",\n" +
                    "    \"input\": {\n" +
                    "        \"action\": \"create\",\n" +
                    "        \"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\",\n" +
                    "        \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" +
                    "        \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" +
                    "        \"preferred_name\": \"announcer\",\n" +
                    "        \"language\": \"en\"\n" +
                    "    },\n" +
                    "    \"parameters\": {\n" +
                    "        \"sample_rate\": 24000,\n" +
                    "        \"response_format\": \"wav\"\n" +
                    "    }\n" +
                    "}";
    
            HttpURLConnection connection = null;
            try {
                // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
                URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
                connection = (HttpURLConnection) url.openConnection();
    
                // Set the request method and headers
                connection.setRequestMethod("POST");
                connection.setRequestProperty("Authorization", "Bearer " + apiKey);
                connection.setRequestProperty("Content-Type", "application/json");
                connection.setDoOutput(true);
                connection.setDoInput(true);
    
                // Send the request body
                try (OutputStream os = connection.getOutputStream()) {
                    byte[] input = jsonBody.getBytes("UTF-8");
                    os.write(input, 0, input.length);
                    os.flush();
                }
    
                // Get the response
                int responseCode = connection.getResponseCode();
                if (responseCode == HttpURLConnection.HTTP_OK) {
                    // Read the response content
                    StringBuilder response = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            response.append(responseLine.trim());
                        }
                    }
    
                    // Parse the JSON response
                    JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
                    JsonObject outputObj = jsonResponse.getAsJsonObject("output");
                    JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio");
    
                    // Get the voice name
                    String voiceName = outputObj.get("voice").getAsString();
                    System.out.println("Voice name: " + voiceName);
    
                    // Get the Base64-encoded audio data
                    String base64Audio = previewAudioObj.get("data").getAsString();
    
                    // Decode the Base64 audio data
                    byte[] audioBytes = Base64.getDecoder().decode(base64Audio);
    
                    // Save the audio to a local file
                    String filename = voiceName + "_preview.wav";
                    saveAudioToFile(audioBytes, filename);
    
                    System.out.println("Audio saved to local file: " + filename);
    
                } else {
                    // Read the error response
                    StringBuilder errorResponse = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            errorResponse.append(responseLine.trim());
                        }
                    }
    
                    System.out.println("Request failed with status code: " + responseCode);
                    System.out.println("Error response: " + errorResponse.toString());
                }
    
            } catch (Exception e) {
                System.err.println("An error occurred during the request: " + e.getMessage());
                e.printStackTrace();
            } finally {
                if (connection != null) {
                    connection.disconnect();
                }
            }
        }
    
        private void saveAudioToFile(byte[] audioBytes, String filename) {
            try {
                File file = new File(filename);
                try (FileOutputStream fos = new FileOutputStream(file)) {
                    fos.write(audioBytes);
                }
                System.out.println("Audio saved to: " + file.getAbsolutePath());
            } catch (IOException e) {
                System.err.println("An error occurred while saving the audio file: " + e.getMessage());
                e.printStackTrace();
            }
        }
    }
  2. Use the custom voice created in the previous step for speech synthesis.

    This example follows the "server commit mode" sample code for system voices in the DashScope SDK. Replace the voice parameter with the custom voice generated by voice design.

    Key principle: The model used for voice design (target_model) must match the model used for subsequent speech synthesis (model). Otherwise, synthesis fails.

    Python

    # coding=utf-8
    # Installation instructions for pyaudio:
    # APPLE Mac OS X
    #   brew install portaudio
    #   pip install pyaudio
    # Debian/Ubuntu
    #   sudo apt-get install python-pyaudio python3-pyaudio
    #   or
    #   pip install pyaudio
    # CentOS
    #   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
    # Microsoft Windows
    #   python -m pip install pyaudio
    
    import pyaudio
    import os
    import base64
    import threading
    import time
    import dashscope  # DashScope Python SDK version must be 1.23.9 or later
    from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat
    
    # ======= Constant configuration =======
    TEXT_TO_SYNTHESIZE = [
        'Right? I really like this kind of supermarket,',
        'especially during the New Year.',
        'Going to the supermarket',
        'just makes me feel',
        'super, super happy!',
        'I want to buy so many things!'
    ]
    
    def init_dashscope_api_key():
        """
        Initialize the API key for the DashScope SDK.
        """
        # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        # If the environment variable is not set, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
        dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
    
    # ======= Callback class =======
    class MyCallback(QwenTtsRealtimeCallback):
        """
        Custom TTS streaming callback.
        """
        def __init__(self):
            self.complete_event = threading.Event()
            self._player = pyaudio.PyAudio()
            self._stream = self._player.open(
                format=pyaudio.paInt16, channels=1, rate=24000, output=True
            )
    
        def on_open(self) -> None:
            print('[TTS] Connection established')
    
        def on_close(self, close_status_code, close_msg) -> None:
            self._stream.stop_stream()
            self._stream.close()
            self._player.terminate()
            print(f'[TTS] Connection closed, code={close_status_code}, msg={close_msg}')
    
        def on_event(self, response: dict) -> None:
            try:
                event_type = response.get('type', '')
                if event_type == 'session.created':
                    print(f'[TTS] Session started: {response["session"]["id"]}')
                elif event_type == 'response.audio.delta':
                    audio_data = base64.b64decode(response['delta'])
                    self._stream.write(audio_data)
                elif event_type == 'response.done':
                    print(f'[TTS] Response complete, Response ID: {qwen_tts_realtime.get_last_response_id()}')
                elif event_type == 'session.finished':
                    print('[TTS] Session finished')
                    self.complete_event.set()
            except Exception as e:
                print(f'[Error] Exception processing callback event: {e}')
    
        def wait_for_finished(self):
            self.complete_event.wait()
    
    # ======= Main execution logic =======
    if __name__ == '__main__':
        init_dashscope_api_key()
        print('[System] Initializing Qwen TTS Realtime ...')
    
        callback = MyCallback()
        qwen_tts_realtime = QwenTtsRealtime(
            # Use the same model for voice design and speech synthesis
            model="qwen3-tts-vd-realtime-2026-01-15",
            callback=callback,
            # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
            url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
        )
        qwen_tts_realtime.connect()
        
        qwen_tts_realtime.update_session(
            voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design
            response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
            mode='server_commit'
        )
    
        for text_chunk in TEXT_TO_SYNTHESIZE:
            print(f'[Sending text]: {text_chunk}')
            qwen_tts_realtime.append_text(text_chunk)
            time.sleep(0.1)
    
        qwen_tts_realtime.finish()
        callback.wait_for_finished()
    
        print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
              f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')

    Java

    import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
    import com.alibaba.dashscope.exception.NoApiKeyException;
    import com.google.gson.JsonObject;
    
    import javax.sound.sampled.*;
    import java.io.*;
    import java.util.Base64;
    import java.util.Queue;
    import java.util.concurrent.CountDownLatch;
    import java.util.concurrent.atomic.AtomicReference;
    import java.util.concurrent.ConcurrentLinkedQueue;
    import java.util.concurrent.atomic.AtomicBoolean;
    
    public class Main {
        // ===== Constant definitions =====
        private static String[] textToSynthesize = {
                "Right? I really like this kind of supermarket,",
                "especially during the New Year.",
                "Going to the supermarket",
                "just makes me feel",
                "super, super happy!",
                "I want to buy so many things!"
        };
    
        // Real-time audio player class
        public static class RealtimePcmPlayer {
            private int sampleRate;
            private SourceDataLine line;
            private AudioFormat audioFormat;
            private Thread decoderThread;
            private Thread playerThread;
            private AtomicBoolean stopped = new AtomicBoolean(false);
            private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
            private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
    
            // Constructor initializes audio format and audio line
            public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
                this.sampleRate = sampleRate;
                this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
                DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
                line = (SourceDataLine) AudioSystem.getLine(info);
                line.open(audioFormat);
                line.start();
                decoderThread = new Thread(new Runnable() {
                    @Override
                    public void run() {
                        while (!stopped.get()) {
                            String b64Audio = b64AudioBuffer.poll();
                            if (b64Audio != null) {
                                byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
                                RawAudioBuffer.add(rawAudio);
                            } else {
                                try {
                                    Thread.sleep(100);
                                } catch (InterruptedException e) {
                                    throw new RuntimeException(e);
                                }
                            }
                        }
                    }
                });
                playerThread = new Thread(new Runnable() {
                    @Override
                    public void run() {
                        while (!stopped.get()) {
                            byte[] rawAudio = RawAudioBuffer.poll();
                            if (rawAudio != null) {
                                try {
                                    playChunk(rawAudio);
                                } catch (IOException e) {
                                    throw new RuntimeException(e);
                                } catch (InterruptedException e) {
                                    throw new RuntimeException(e);
                                }
                            } else {
                                try {
                                    Thread.sleep(100);
                                } catch (InterruptedException e) {
                                    throw new RuntimeException(e);
                                }
                            }
                        }
                    }
                });
                decoderThread.start();
                playerThread.start();
            }
    
            // Plays an audio chunk and blocks until playback is complete
            private void playChunk(byte[] chunk) throws IOException, InterruptedException {
                if (chunk == null || chunk.length == 0) return;
    
                int bytesWritten = 0;
                while (bytesWritten < chunk.length) {
                    bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
                }
                int audioLength = chunk.length / (this.sampleRate*2/1000);
                // Wait for the audio in the buffer to finish playing
                Thread.sleep(audioLength - 10);
            }
    
            public void write(String b64Audio) {
                b64AudioBuffer.add(b64Audio);
            }
    
            public void cancel() {
                b64AudioBuffer.clear();
                RawAudioBuffer.clear();
            }
    
            public void waitForComplete() throws InterruptedException {
                while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
                    Thread.sleep(100);
                }
                line.drain();
            }
    
            public void shutdown() throws InterruptedException {
                stopped.set(true);
                decoderThread.join();
                playerThread.join();
                if (line != null && line.isRunning()) {
                    line.drain();
                    line.close();
                }
            }
        }
    
        public static void main(String[] args) throws Exception {
            QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                    // Use the same model for voice design and speech synthesis
                    .model("qwen3-tts-vd-realtime-2026-01-15")
                    // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                    .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                    // API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                    // If the environment variable is not set, replace the following line with your Model Studio API key: .apikey("sk-xxx")
                    .apikey(System.getenv("DASHSCOPE_API_KEY"))
                    .build();
            AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
            final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
    
            // Create a real-time audio player instance
            RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
    
            QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
                @Override
                public void onOpen() {
                    // Handling for when the connection is established
                }
                @Override
                public void onEvent(JsonObject message) {
                    String type = message.get("type").getAsString();
                    switch(type) {
                        case "session.created":
                            // Handling for when the session is created
                            break;
                        case "response.audio.delta":
                            String recvAudioB64 = message.get("delta").getAsString();
                            // Play audio in real time
                            audioPlayer.write(recvAudioB64);
                            break;
                        case "response.done":
                            // Handling for when the response is complete
                            break;
                        case "session.finished":
                            // Handling for when the session is finished
                            completeLatch.get().countDown();
                        default:
                            break;
                    }
                }
                @Override
                public void onClose(int code, String reason) {
                    // Handling for when the connection is closed
                }
            });
            qwenTtsRef.set(qwenTtsRealtime);
            try {
                qwenTtsRealtime.connect();
            } catch (NoApiKeyException e) {
                throw new RuntimeException(e);
            }
            QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                    .voice("myvoice") // Replace the voice parameter with the custom voice generated by voice design
                    .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                    .mode("server_commit")
                    .build();
            qwenTtsRealtime.updateSession(config);
            for (String text:textToSynthesize) {
                qwenTtsRealtime.appendText(text);
                Thread.sleep(100);
            }
            qwenTtsRealtime.finish();
            completeLatch.get().await();
    
            // Wait for audio playback to complete and shut down the player
            audioPlayer.waitForComplete();
            audioPlayer.shutdown();
            System.exit(0);
        }
    }

Non-streaming and unidirectional streaming synthesis

Applies to Qwen3-TTS-VC series models. Seea data-tag="xref" baseUrl="t3200114_v3_0_0.xdita" data-node="5585737" data-root="85177" href="t2884192.xdita#" id="4f641ff083awx">.

  1. Create a custom voice and preview it. If satisfied, proceed. Otherwise, recreate.

    Python

    import requests
    import base64
    import os
    
    def create_voice_and_play():
        # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx"
        api_key = os.getenv("DASHSCOPE_API_KEY")
        
        if not api_key:
            print("Error: DASHSCOPE_API_KEY environment variable not found. Please set the API key first.")
            return None, None, None
        
        # Prepare request data
        headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }
        
        data = {
            "model": "qwen-voice-design",
            "input": {
                "action": "create",
                "target_model": "qwen3-tts-vd-2026-01-26",
                "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
                "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
                "preferred_name": "announcer",
                "language": "en"
            },
            "parameters": {
                "sample_rate": 24000,
                "response_format": "wav"
            }
        }
        
        # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
        
        try:
            # Send the request
            response = requests.post(
                url,
                headers=headers,
                json=data,
                timeout=60  # Add a timeout setting
            )
            
            if response.status_code == 200:
                result = response.json()
                
                # Get the voice name
                voice_name = result["output"]["voice"]
                print(f"Voice name: {voice_name}")
                
                # Get the preview audio data
                base64_audio = result["output"]["preview_audio"]["data"]
                
                # Decode the Base64 audio data
                audio_bytes = base64.b64decode(base64_audio)
                
                # Save the audio file locally
                filename = f"{voice_name}_preview.wav"
                
                # Write the audio data to a local file
                with open(filename, 'wb') as f:
                    f.write(audio_bytes)
                
                print(f"Audio saved to local file: {filename}")
                print(f"File path: {os.path.abspath(filename)}")
                
                return voice_name, audio_bytes, filename
            else:
                print(f"Request failed with status code: {response.status_code}")
                print(f"Response content: {response.text}")
                return None, None, None
                
        except requests.exceptions.RequestException as e:
            print(f"A network request error occurred: {e}")
            return None, None, None
        except KeyError as e:
            print(f"Response data format error, missing required field: {e}")
            print(f"Response content: {response.text if 'response' in locals() else 'No response'}")
            return None, None, None
        except Exception as e:
            print(f"An unknown error occurred: {e}")
            return None, None, None
    
    if __name__ == "__main__":
        print("Starting to create voice...")
        voice_name, audio_data, saved_filename = create_voice_and_play()
        
        if voice_name:
            print(f"\nSuccessfully created voice '{voice_name}'")
            print(f"Audio file saved as: '{saved_filename}'")
            print(f"File size: {os.path.getsize(saved_filename)} bytes")
        else:
            print("\nVoice creation failed")

    Java

    Add the Gson dependency to your project:

    Maven

    Add the following to your pom.xml:

    <!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
    <dependency>
        <groupId>com.google.code.gson</groupId>
        <artifactId>gson</artifactId>
        <version>2.13.1</version>
    </dependency>

    Gradle

    Add the following to your build.gradle:

    // https://mvnrepository.com/artifact/com.google.code.gson/gson
    implementation("com.google.code.gson:gson:2.13.1")
    Important

    To use a custom voice generated by voice design for speech synthesis, configure the voice as follows:

    MultiModalConversationParam param = MultiModalConversationParam.builder()
                    .parameter("voice", "your_voice") // Replace the voice parameter with the custom voice generated by voice design
                    .build();
    import com.google.gson.JsonObject;
    import com.google.gson.JsonParser;
    import java.io.*;
    import java.net.HttpURLConnection;
    import java.net.URL;
    import java.util.Base64;
    
    public class Main {
        public static void main(String[] args) {
            Main example = new Main();
            example.createVoice();
        }
    
        public void createVoice() {
            // API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If the environment variable is not set, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
    
            // Create the JSON request body string
            String jsonBody = "{\n" +
                    "    \"model\": \"qwen-voice-design\",\n" +
                    "    \"input\": {\n" +
                    "        \"action\": \"create\",\n" +
                    "        \"target_model\": \"qwen3-tts-vd-2026-01-26\",\n" +
                    "        \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" +
                    "        \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" +
                    "        \"preferred_name\": \"announcer\",\n" +
                    "        \"language\": \"en\"\n" +
                    "    },\n" +
                    "    \"parameters\": {\n" +
                    "        \"sample_rate\": 24000,\n" +
                    "        \"response_format\": \"wav\"\n" +
                    "    }\n" +
                    "}";
    
            HttpURLConnection connection = null;
            try {
                // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
                URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
                connection = (HttpURLConnection) url.openConnection();
    
                // Set the request method and headers
                connection.setRequestMethod("POST");
                connection.setRequestProperty("Authorization", "Bearer " + apiKey);
                connection.setRequestProperty("Content-Type", "application/json");
                connection.setDoOutput(true);
                connection.setDoInput(true);
    
                // Send the request body
                try (OutputStream os = connection.getOutputStream()) {
                    byte[] input = jsonBody.getBytes("UTF-8");
                    os.write(input, 0, input.length);
                    os.flush();
                }
    
                // Get the response
                int responseCode = connection.getResponseCode();
                if (responseCode == HttpURLConnection.HTTP_OK) {
                    // Read the response content
                    StringBuilder response = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            response.append(responseLine.trim());
                        }
                    }
    
                    // Parse the JSON response
                    JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
                    JsonObject outputObj = jsonResponse.getAsJsonObject("output");
                    JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio");
    
                    // Get the voice name
                    String voiceName = outputObj.get("voice").getAsString();
                    System.out.println("Voice name: " + voiceName);
    
                    // Get the Base64-encoded audio data
                    String base64Audio = previewAudioObj.get("data").getAsString();
    
                    // Decode the Base64 audio data
                    byte[] audioBytes = Base64.getDecoder().decode(base64Audio);
    
                    // Save the audio to a local file
                    String filename = voiceName + "_preview.wav";
                    saveAudioToFile(audioBytes, filename);
    
                    System.out.println("Audio saved to local file: " + filename);
    
                } else {
                    // Read the error response
                    StringBuilder errorResponse = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            errorResponse.append(responseLine.trim());
                        }
                    }
    
                    System.out.println("Request failed with status code: " + responseCode);
                    System.out.println("Error response: " + errorResponse.toString());
                }
    
            } catch (Exception e) {
                System.err.println("An error occurred during the request: " + e.getMessage());
                e.printStackTrace();
            } finally {
                if (connection != null) {
                    connection.disconnect();
                }
            }
        }
    
        private void saveAudioToFile(byte[] audioBytes, String filename) {
            try {
                File file = new File(filename);
                try (FileOutputStream fos = new FileOutputStream(file)) {
                    fos.write(audioBytes);
                }
                System.out.println("Audio saved to: " + file.getAbsolutePath());
            } catch (IOException e) {
                System.err.println("An error occurred while saving the audio file: " + e.getMessage());
                e.printStackTrace();
            }
        }
    }
  2. Use the custom voice created in the previous step for non-streaming speech synthesis.

    This example follows the "non-streaming output" sample code for system voices in the DashScope SDK. Replace the voice parameter with the custom voice generated by voice design. For unidirectional streaming synthesis, see Speech synthesis - Qwen.

    Key principle: The model used for voice design (target_model) must match the model used for subsequent speech synthesis (model). Otherwise, synthesis fails.

    Python

    import os
    import dashscope
    
    
    if __name__ == '__main__':
        # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
        dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
    
        text = "What's the weather like today?"
        # How to use SpeechSynthesizer: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
        response = dashscope.MultiModalConversation.call(
            model="qwen3-tts-vd-2026-01-26",
            # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx"
            api_key=os.getenv("DASHSCOPE_API_KEY"),
            text=text,
            voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design
            stream=False
        )
        print(response)

    Java

    import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
    import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
    import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
    import com.alibaba.dashscope.exception.ApiException;
    import com.alibaba.dashscope.exception.NoApiKeyException;
    import com.alibaba.dashscope.exception.UploadFileException;
    
    import com.alibaba.dashscope.utils.Constants;
    import java.io.FileOutputStream;
    import java.io.InputStream;
    import java.net.URL;
    
    public class Main {
        private static final String MODEL = "qwen3-tts-vd-2026-01-26";
        public static void call() throws ApiException, NoApiKeyException, UploadFileException {
            MultiModalConversation conv = new MultiModalConversation();
            MultiModalConversationParam param = MultiModalConversationParam.builder()
                    // API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                    // If the environment variable is not set, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    .model(MODEL)
                    .text("Today is a wonderful day to build something people love!")
                    .parameter("voice", "myvoice") // Replace the voice parameter with the custom voice generated by voice design
                    .build();
            MultiModalConversationResult result = conv.call(param);
            String audioUrl = result.getOutput().getAudio().getUrl();
            System.out.print(audioUrl);
    
            // Download the audio file locally
            try (InputStream in = new URL(audioUrl).openStream();
                 FileOutputStream out = new FileOutputStream("downloaded_audio.wav")) {
                byte[] buffer = new byte[1024];
                int bytesRead;
                while ((bytesRead = in.read(buffer)) != -1) {
                    out.write(buffer, 0, bytesRead);
                }
                System.out.println("\nAudio file downloaded locally: downloaded_audio.wav");
            } catch (Exception e) {
                System.out.println("\nError downloading audio file: " + e.getMessage());
            }
        }
        public static void main(String[] args) {
            try {
                // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
                Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
                call();
            } catch (ApiException | NoApiKeyException | UploadFileException e) {
                System.out.println(e.getMessage());
            }
            System.exit(0);
        }
    }

API reference

Use the same account for all API operations.

Create voice

Submit a voice description and preview text to create a custom voice.

  • URL

    Chinese mainland:

    POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

    International:

    POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • Request headers

    Parameter

    Type

    Required

    Description

    Authorization

    string

    Supported

    Authentication token, formatted as Bearer <your_api_key>. Replace <your_api_key> with your actual API key.

    Content-Type

    string

    Supported

    Media type of data transmitted in the request body. Fixed value: application/json.

  • Request body

    The request body includes all parameters. Optional fields can be omitted based on your needs.

    Important

    Distinguish the following parameters:

    • model: Voice design model. Fixed value: qwen-voice-design.

    • target_model: Speech synthesis model driving this voice. Must match the speech synthesis model used in subsequent calls, or synthesis fails.

    {
        "model": "qwen-voice-design",
        "input": {
            "action": "create",
            "target_model": "qwen3-tts-vd-realtime-2026-01-15",
            "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
            "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
            "preferred_name": "announcer",
            "language": "zh"
        },
        "parameters": {
            "sample_rate": 24000,
            "response_format": "wav"
        }
    }
  • Request parameters

    Parameter

    Type

    Default

    Required

    Description

    model

    string

    -

    Supported

    Voice design model. Fixed value: qwen-voice-design.

    action

    string

    -

    Supported

    Action type. Fixed value: create.

    target_model

    string

    -

    Supported

    Speech synthesis model driving this voice. Supported models (two types):

    Must match the speech synthesis model used in subsequent calls, or synthesis fails.

    voice_prompt

    string

    -

    Supported

    Voice description. Maximum length: 2,048 characters.

    Supports Chinese and English only.

    For guidance on writing voice descriptions, see "How to write high-quality voice descriptions?".

    preview_text

    string

    -

    Supported

    Text for the preview audio. Maximum length: 1,024 characters.

    Supported languages: Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru).

    preferred_name

    string

    -

    Supported

    Name to identify the voice (alphanumeric characters and underscores only, up to 16 characters). Choose a name related to the role or scenario.

    This keyword appears in the generated voice name. Example: keyword "announcer" → voice name "qwen-tts-vd-announcer-voice-20251201102800-a1b2".

    language

    string

    zh

    Not supported

    Language code specifying the language preference for the generated voice. This affects language-specific features and pronunciation tendencies. Select the appropriate language code for your use case.

    If specified, this language must match the preview_text language.

    Valid values: zh (Chinese), en (English), de (German), it (Italian), pt (Portuguese), es (Spanish), ja (Japanese), ko (Korean), fr (French), ru (Russian).

    sample_rate

    int

    24000

    No

    Sample rate (Hz) for the preview audio generated by voice design.

    Valid values:

    • 8000

    • 16000

    • 24000

    • 48000

    response_format

    string

    wav

    Not supported

    Audio format for the preview audio generated by voice design.

    Valid values:

    • pcm

    • wav

    • mp3

    • opus

  • Response parameters

    Click to view a response example

    {
        "output": {
            "preview_audio": {
                "data": "{base64_encoded_audio}",
                "sample_rate": 24000,
                "response_format": "wav"
            },
            "target_model": "qwen3-tts-vd-realtime-2026-01-15",
            "voice": "yourVoice"
        },
        "usage": {
            "count": 1
        },
        "request_id": "yourRequestId"
    }

    Key parameters:

    Parameter

    Type

    Description

    voice

    string

    Voice name. Use directly as the voice parameter in speech synthesis APIs.

    data

    string

    Preview audio data generated by voice design, returned as a Base64-encoded string.

    sample_rate

    int

    Sample rate (Hz) for the preview audio generated by voice design. Matches the sample rate used when creating the voice. Default is 24000 Hz if unspecified.

    response_format

    string

    Audio format for the preview audio generated by voice design. Matches the format used when creating the voice. Default is wav if unspecified.

    target_model

    string

    Speech synthesis model driving this voice. Supported models (two types):

    Must match the speech synthesis model used in subsequent calls, or synthesis fails.

    request_id

    string

    Request ID.

    count

    integer

    This request incurs a charge for the number of “Create Voice” operations actually performed. The cost for this request is $.

    For voice creation, count is always 1.

  • Sample code

    Important

    Distinguish the following parameters:

    • model: Voice design model. Fixed value: qwen-voice-design.

    • target_model: Speech synthesis model driving this voice. Must match the speech synthesis model used in subsequent calls, or synthesis fails.

    cURL

    If you have not configured the API key in an environment variable, replace $DASHSCOPE_API_KEY in the example with your actual API key.

    https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization# ======= Important note =======
    # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # === Delete this comment before execution ===
    
    curl -X POST <a data-init-id="9f104f338c7kz" href="https://poc-dashscope.aliyuncs.com/api/v1/services/audio/tts/customization" id="28f184e9f7vq7">https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization</a> \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-voice-design",
        "input": {
            "action": "create",
            "target_model": "qwen3-tts-vd-realtime-2026-01-15",
            "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
            "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
            "preferred_name": "announcer",
            "language": "zh"
        },
        "parameters": {
            "sample_rate": 24000,
            "response_format": "wav"
        }
    }'

    Python

    import requests
    import base64
    import os
    
    def create_voice_and_play():
        # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx"
        api_key = os.getenv("DASHSCOPE_API_KEY")
        
        if not api_key:
            print("Error: DASHSCOPE_API_KEY environment variable not found. Please set the API key first.")
            return None, None, None
        
        # Prepare request data
        headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }
        
        data = {
            "model": "qwen-voice-design",
            "input": {
                "action": "create",
                "target_model": "qwen3-tts-vd-realtime-2026-01-15",
                "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
                "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
                "preferred_name": "announcer",
                "language": "en"
            },
            "parameters": {
                "sample_rate": 24000,
                "response_format": "wav"
            }
        }
        
        # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
        
        try:
            # Send the request
            response = requests.post(
                url,
                headers=headers,
                json=data,
                timeout=60  # Add a timeout setting
            )
            
            if response.status_code == 200:
                result = response.json()
                
                # Get the voice name
                voice_name = result["output"]["voice"]
                print(f"Voice name: {voice_name}")
                
                # Get the preview audio data
                base64_audio = result["output"]["preview_audio"]["data"]
                
                # Decode the Base64 audio data
                audio_bytes = base64.b64decode(base64_audio)
                
                # Save the audio file locally
                filename = f"{voice_name}_preview.wav"
                
                # Write the audio data to a local file
                with open(filename, 'wb') as f:
                    f.write(audio_bytes)
                
                print(f"Audio saved to local file: {filename}")
                print(f"File path: {os.path.abspath(filename)}")
                
                return voice_name, audio_bytes, filename
            else:
                print(f"Request failed with status code: {response.status_code}")
                print(f"Response content: {response.text}")
                return None, None, None
                
        except requests.exceptions.RequestException as e:
            print(f"A network request error occurred: {e}")
            return None, None, None
        except KeyError as e:
            print(f"Response data format error, missing required field: {e}")
            print(f"Response content: {response.text if 'response' in locals() else 'No response'}")
            return None, None, None
        except Exception as e:
            print(f"An unknown error occurred: {e}")
            return None, None, None
    
    if __name__ == "__main__":
        print("Starting to create voice...")
        voice_name, audio_data, saved_filename = create_voice_and_play()
        
        if voice_name:
            print(f"\nSuccessfully created voice '{voice_name}'")
            print(f"Audio file saved as: '{saved_filename}'")
            print(f"File size: {os.path.getsize(saved_filename)} bytes")
        else:
            print("\nVoice creation failed")

    Java

    import com.google.gson.JsonObject;
    import com.google.gson.JsonParser;
    import java.io.*;
    import java.net.HttpURLConnection;
    import java.net.URL;
    import java.util.Base64;
    
    public class Main {
        public static void main(String[] args) {
            Main example = new Main();
            example.createVoice();
        }
    
        public void createVoice() {
            // API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If the environment variable is not set, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
    
            // Create the JSON request body string
            String jsonBody = "{\n" +
                    "    \"model\": \"qwen-voice-design\",\n" +
                    "    \"input\": {\n" +
                    "        \"action\": \"create\",\n" +
                    "        \"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\",\n" +
                    "        \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" +
                    "        \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" +
                    "        \"preferred_name\": \"announcer\",\n" +
                    "        \"language\": \"en\"\n" +
                    "    },\n" +
                    "    \"parameters\": {\n" +
                    "        \"sample_rate\": 24000,\n" +
                    "        \"response_format\": \"wav\"\n" +
                    "    }\n" +
                    "}";
    
            HttpURLConnection connection = null;
            try {
                // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
                URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
                connection = (HttpURLConnection) url.openConnection();
    
                // Set the request method and headers
                connection.setRequestMethod("POST");
                connection.setRequestProperty("Authorization", "Bearer " + apiKey);
                connection.setRequestProperty("Content-Type", "application/json");
                connection.setDoOutput(true);
                connection.setDoInput(true);
    
                // Send the request body
                try (OutputStream os = connection.getOutputStream()) {
                    byte[] input = jsonBody.getBytes("UTF-8");
                    os.write(input, 0, input.length);
                    os.flush();
                }
    
                // Get the response
                int responseCode = connection.getResponseCode();
                if (responseCode == HttpURLConnection.HTTP_OK) {
                    // Read the response content
                    StringBuilder response = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            response.append(responseLine.trim());
                        }
                    }
    
                    // Parse the JSON response
                    JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
                    JsonObject outputObj = jsonResponse.getAsJsonObject("output");
                    JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio");
    
                    // Get the voice name
                    String voiceName = outputObj.get("voice").getAsString();
                    System.out.println("Voice name: " + voiceName);
    
                    // Get the Base64-encoded audio data
                    String base64Audio = previewAudioObj.get("data").getAsString();
    
                    // Decode the Base64 audio data
                    byte[] audioBytes = Base64.getDecoder().decode(base64Audio);
    
                    // Save the audio to a local file
                    String filename = voiceName + "_preview.wav";
                    saveAudioToFile(audioBytes, filename);
    
                    System.out.println("Audio saved to local file: " + filename);
    
                } else {
                    // Read the error response
                    StringBuilder errorResponse = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            errorResponse.append(responseLine.trim());
                        }
                    }
    
                    System.out.println("Request failed with status code: " + responseCode);
                    System.out.println("Error response: " + errorResponse.toString());
                }
    
            } catch (Exception e) {
                System.err.println("An error occurred during the request: " + e.getMessage());
                e.printStackTrace();
            } finally {
                if (connection != null) {
                    connection.disconnect();
                }
            }
        }
    
        private void saveAudioToFile(byte[] audioBytes, String filename) {
            try {
                File file = new File(filename);
                try (FileOutputStream fos = new FileOutputStream(file)) {
                    fos.write(audioBytes);
                }
                System.out.println("Audio saved to: " + file.getAbsolutePath());
            } catch (IOException e) {
                System.err.println("An error occurred while saving the audio file: " + e.getMessage());
                e.printStackTrace();
            }
        }
    }

List voices

Returns a paginated list of all voices created under your account.

  • URL

    Chinese mainland:

    POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

    International:

    POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • Request headers

    Parameter

    Type

    Required

    Description

    Authorization

    string

    Supported

    Authentication token, formatted as Bearer <your_api_key>. Replace <your_api_key> with your actual API key.

    Content-Type

    string

    Supported

    Media type of data transmitted in the request body. Fixed value: application/json.

  • Request body

    The request body contains all parameters. Omit optional fields as needed.

    Important

    model: Voice design model. Fixed to qwen-voice-design. Do not change this value.

    {
        "model": "qwen-voice-design",
        "input": {
            "action": "list",
            "page_size": 10,
            "page_index": 0
        }
    }
  • Request parameters

    Parameter

    Type

    Default

    Required

    Description

    model

    string

    --

    Supported

    Voice design model. Fixed value: qwen-voice-design.

    action

    string

    --

    Supported

    Action type. Fixed to list.

    page_index

    integer

    0

    Not supported

    Page number. Range: 0–200.

    page_size

    integer

    10

    Not supported

    Entries per page. Must be greater than 0.

  • Response parameters

    Click to view a response example

    {
        "output": {
            "page_index": 0,
            "page_size": 2,
            "total_count": 26,
            "voice_list": [
                {
                    "gmt_create": "2025-12-10 17:04:54",
                    "gmt_modified": "2025-12-10 17:04:54",
                    "language": "zh",
                    "preview_text": "Dear listeners, hello everyone. Welcome to today's program.",
                    "target_model": "qwen3-tts-vd-realtime-2026-01-15",
                    "voice": "yourVoice1",
                    "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary. Deep and magnetic, steady speaking speed."
                },
                {
                    "gmt_create": "2025-12-10 15:31:35",
                    "gmt_modified": "2025-12-10 15:31:35",
                    "language": "zh",
                    "preview_text": "Dear listeners, hello everyone.",
                    "target_model": "qwen3-tts-vd-realtime-2026-01-15",
                    "voice": "yourVoice2",
                    "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary."
                }
            ]
        },
        "usage": {},
        "request_id": "yourRequestId"
    }

    Key parameters:

    Parameter

    Type

    Description

    voice

    string

    Voice name. Use directly as the voice parameter in speech synthesis APIs.

    target_model

    string

    Speech synthesis model driving this voice. Supported models (two types):

    Must match the speech synthesis model used in subsequent calls, or synthesis fails.

    language

    string

    Language code.

    Valid values: zh (Chinese), en (English), de (German), it (Italian), pt (Portuguese), es (Spanish), ja (Japanese), ko (Korean), fr (French), ru (Russian).

    voice_prompt

    string

    Voice description.

    preview_text

    string

    Preview text.

    gmt_create

    string

    The voice's creation time.

    gmt_modified

    string

    The time when the voice was last modified.

    page_index

    integer

    Page number.

    page_size

    integer

    Entries per page.

    total_count

    integer

    The total number of records that the query returns.

    request_id

    string

    Request ID.

  • Sample code

    Important

    model: Voice design model. Fixed to qwen-voice-design. Do not change this value.

    cURL

    If you have not set the API key as an environment variable, you must replace $DASHSCOPE_API_KEY in the example with your actual API key.

    # ======= Important notice =======
    # This URL is for the Singapore region. If you use the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    # API keys differ between the Singapore and China (Beijing) regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # === Remove this comment before running ===
    
    curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-voice-design",
        "input": {
            "action": "list",
            "page_size": 10,
            "page_index": 0
        }
    }'

    Python

    import os
    import requests
    
    # API keys differ between the Singapore and China (Beijing) regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you have not set an environment variable, replace the next line with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")
    # This URL is for the Singapore region. If you use the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    
    payload = {
        "model": "qwen-voice-design", # Do not change this value
        "input": {
            "action": "list",
            "page_size": 10,
            "page_index": 0
        }
    }
    
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(url, json=payload, headers=headers)
    
    print("HTTP status code:", response.status_code)
    
    if response.status_code == 200:
        data = response.json()
        voice_list = data["output"]["voice_list"]
    
        print("List of voices:")
        for item in voice_list:
            print(f"- Voice: {item['voice']}  Created: {item['gmt_create']}  Model: {item['target_model']}")
    else:
        print("Request failed:", response.text)

    Java

    import com.google.gson.Gson;
    import com.google.gson.JsonArray;
    import com.google.gson.JsonObject;
    
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.io.OutputStream;
    import java.net.HttpURLConnection;
    import java.net.URL;
    
    public class Main {
        public static void main(String[] args) {
            // API keys differ between the Singapore and China (Beijing) regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If you have not set an environment variable, replace the next line with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
            // This URL is for the Singapore region. If you use the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
            String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
    
            // JSON request body (older Java versions do not support """ multi-line strings)
            String jsonPayload =
                    "{"
                            + "\"model\": \"qwen-voice-design\"," // Do not change this value
                            + "\"input\": {"
                            +     "\"action\": \"list\","
                            +     "\"page_size\": 10,"
                            +     "\"page_index\": 0"
                            + "}"
                            + "}";
    
            try {
                HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
                con.setRequestMethod("POST");
                con.setRequestProperty("Authorization", "Bearer " + apiKey);
                con.setRequestProperty("Content-Type", "application/json");
                con.setDoOutput(true);
    
                try (OutputStream os = con.getOutputStream()) {
                    os.write(jsonPayload.getBytes("UTF-8"));
                }
    
                int status = con.getResponseCode();
                BufferedReader br = new BufferedReader(new InputStreamReader(
                        status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));
    
                StringBuilder response = new StringBuilder();
                String line;
                while ((line = br.readLine()) != null) {
                    response.append(line);
                }
                br.close();
    
                System.out.println("HTTP status code: " + status);
                System.out.println("Response JSON: " + response.toString());
    
                if (status == 200) {
                    Gson gson = new Gson();
                    JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                    JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list");
    
                    System.out.println("\nList of voices:");
                    for (int i = 0; i < voiceList.size(); i++) {
                        JsonObject voiceItem = voiceList.get(i).getAsJsonObject();
                        String voice = voiceItem.get("voice").getAsString();
                        String gmtCreate = voiceItem.get("gmt_create").getAsString();
                        String targetModel = voiceItem.get("target_model").getAsString();
    
                        System.out.printf("- Voice: %s  Created: %s  Model: %s\n",
                                voice, gmtCreate, targetModel);
                    }
                }
    
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

Query a specific voice

Get detailed information about a specific voice by its name.

  • URL

    Chinese mainland:

    POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

    International:

    POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • Request headers

    Parameter

    Type

    Required

    Description

    Authorization

    string

    Supported

    Authentication token, formatted as Bearer <your_api_key>. Replace <your_api_key> with your actual API key.

    Content-Type

    string

    Supported

    Media type of data transmitted in the request body. Fixed value: application/json.

  • Request body

    The request body contains all request parameters. You can omit optional fields as needed.

    Important

    model: Voice design model. Fixed to qwen-voice-design. Do not change this value.

    {
        "model": "qwen-voice-design",
        "input": {
            "action": "query",
            "voice": "voiceName"
        }
    }
  • Request parameters

    Parameter

    Type

    Default

    Required

    Description

    model

    string

    -

    Supported

    Voice design model. Fixed value: qwen-voice-design.

    action

    string

    -

    Supported

    Action type. Fixed value: query.

    voice

    string

    -

    Supported

    The name of the voice to query.

  • Response parameters

    Click to view response examples

    Data found

    {
        "output": {
            "gmt_create": "2025-12-10 14:54:09",
            "gmt_modified": "2025-12-10 17:47:48",
            "language": "zh",
            "preview_text": "Hello, dear listeners.",
            "target_model": "qwen3-tts-vd-realtime-2026-01-15",
            "voice": "yourVoice",
            "voice_prompt": "A calm, middle-aged male announcer with a deep, rich, and magnetic voice. His speaking rate is steady and his articulation is clear. Suitable for news broadcasts or documentary narration."
        },
        "usage": {},
        "request_id": "yourRequestId"
    }

    No data found

    If the queried voice does not exist, the API returns an HTTP 400 status code and the response body contains the VoiceNotFound error code.

    {
        "request_id":"yourRequestId",
        "code":"VoiceNotFound",
        "message":"Voice not found: qwen-tts-vd-announcer-voice-xxxx"
    }

    Key parameters:

    Parameter

    Type

    Description

    voice

    string

    Voice name. Use directly as the voice parameter in speech synthesis APIs.

    target_model

    string

    Speech synthesis model driving this voice. Supported models (two types):

    Must match the speech synthesis model used in subsequent calls, or synthesis fails.

    language

    string

    Language code.

    Valid values: zh (Chinese), en (English), de (German), it (Italian), pt (Portuguese), es (Spanish), ja (Japanese), ko (Korean), fr (French), ru (Russian).

    voice_prompt

    string

    The voice description.

    preview_text

    string

    The preview text.

    gmt_create

    string

    The time when the voice was created.

    gmt_modified

    string

    The time when the voice was last modified.

    request_id

    string

    The request ID.

  • Code examples

    Important

    model: Voice design model. Fixed to qwen-voice-design. Do not change this value.

    cURL

    If you have not set the API key as an environment variable, you must replace $DASHSCOPE_API_KEY in the example with your actual API key.

    # ======= Important =======
    # The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    # The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # === Delete this comment before running the command. ===
    
    curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-voice-design",
        "input": {
            "action": "query",
            "voice": "voiceName"
        }
    }'

    Python

    import requests
    import os
    
    def query_voice(voice_name):
        """
        Queries information about a specific voice.
        :param voice_name: The name of the voice.
        :return: A dictionary that contains the voice information, or None if the voice is not found.
        """
        # The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        # If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
        api_key = os.getenv("DASHSCOPE_API_KEY")
        
        # Prepare the request data.
        headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }
        
        data = {
            "model": "qwen-voice-design",
            "input": {
                "action": "query",
                "voice": voice_name
            }
        }
        
        # The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
        # Send the request.
        response = requests.post(
            url,
            headers=headers,
            json=data
        )
        
        if response.status_code == 200:
            result = response.json()
            
            # Check for error messages.
            if "code" in result and result["code"] == "VoiceNotFound":
                print(f"Voice not found: {voice_name}")
                print(f"Error message: {result.get('message', 'Voice not found')}")
                return None
            
            # Get the voice information.
            voice_info = result["output"]
            print(f"Successfully queried voice information:")
            print(f"  Voice name: {voice_info.get('voice')}")
            print(f"  Creation time: {voice_info.get('gmt_create')}")
            print(f"  Modification time: {voice_info.get('gmt_modified')}")
            print(f"  Language: {voice_info.get('language')}")
            print(f"  Preview text: {voice_info.get('preview_text')}")
            print(f"  Model: {voice_info.get('target_model')}")
            print(f"  Voice description: {voice_info.get('voice_prompt')}")
            
            return voice_info
        else:
            print(f"Request failed, status code: {response.status_code}")
            print(f"Response content: {response.text}")
            return None
    
    def main():
        # Example: Query a voice.
        voice_name = "myvoice"  # Replace with the actual name of the voice you want to query.
        
        print(f"Querying voice: {voice_name}")
        voice_info = query_voice(voice_name)
        
        if voice_info:
            print("\nVoice queried successfully!")
        else:
            print("\nFailed to query the voice or the voice does not exist.")
    
    if __name__ == "__main__":
        main()

    Java

    import com.google.gson.JsonObject;
    import com.google.gson.JsonParser;
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.io.OutputStream;
    import java.net.HttpURLConnection;
    import java.net.URL;
    
    public class Main {
    
        public static void main(String[] args) {
            Main example = new Main();
            // Example: Query a voice.
            String voiceName = "myvoice"; // Replace with the actual name of the voice you want to query.
            System.out.println("Querying voice: " + voiceName);
            example.queryVoice(voiceName);
        }
    
        public void queryVoice(String voiceName) {
            // The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
            // If you have not configured the environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
    
            // Create the JSON request body string.
            String jsonBody = "{\n" +
                    "    \"model\": \"qwen-voice-design\",\n" +
                    "    \"input\": {\n" +
                    "        \"action\": \"query\",\n" +
                    "        \"voice\": \"" + voiceName + "\"\n" +
                    "    }\n" +
                    "}";
    
            HttpURLConnection connection = null;
            try {
                // The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
                URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
                connection = (HttpURLConnection) url.openConnection();
    
                // Set the request method and headers.
                connection.setRequestMethod("POST");
                connection.setRequestProperty("Authorization", "Bearer " + apiKey);
                connection.setRequestProperty("Content-Type", "application/json");
                connection.setDoOutput(true);
                connection.setDoInput(true);
    
                // Send the request body.
                try (OutputStream os = connection.getOutputStream()) {
                    byte[] input = jsonBody.getBytes("UTF-8");
                    os.write(input, 0, input.length);
                    os.flush();
                }
    
                // Get the response.
                int responseCode = connection.getResponseCode();
                if (responseCode == HttpURLConnection.HTTP_OK) {
                    // Read the response content.
                    StringBuilder response = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            response.append(responseLine.trim());
                        }
                    }
    
                    // Parse the JSON response.
                    JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
    
                    // Check for error messages.
                    if (jsonResponse.has("code") && "VoiceNotFound".equals(jsonResponse.get("code").getAsString())) {
                        String errorMessage = jsonResponse.has("message") ?
                                jsonResponse.get("message").getAsString() : "Voice not found";
                        System.out.println("Voice not found: " + voiceName);
                        System.out.println("Error message: " + errorMessage);
                        return;
                    }
    
                    // Get the voice information.
                    JsonObject outputObj = jsonResponse.getAsJsonObject("output");
    
                    System.out.println("Successfully queried voice information:");
                    System.out.println("  Voice name: " + outputObj.get("voice").getAsString());
                    System.out.println("  Creation time: " + outputObj.get("gmt_create").getAsString());
                    System.out.println("  Modification time: " + outputObj.get("gmt_modified").getAsString());
                    System.out.println("  Language: " + outputObj.get("language").getAsString());
                    System.out.println("  Preview text: " + outputObj.get("preview_text").getAsString());
                    System.out.println("  Model: " + outputObj.get("target_model").getAsString());
                    System.out.println("  Voice description: " + outputObj.get("voice_prompt").getAsString());
    
                } else {
                    // Read the error response.
                    StringBuilder errorResponse = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            errorResponse.append(responseLine.trim());
                        }
                    }
    
                    System.out.println("Request failed, status code: " + responseCode);
                    System.out.println("Error response: " + errorResponse.toString());
                }
    
            } catch (Exception e) {
                System.err.println("An error occurred during the request: " + e.getMessage());
                e.printStackTrace();
            } finally {
                if (connection != null) {
                    connection.disconnect();
                }
            }
        }
    }

Delete a voice

Deletes a specified voice and releases the corresponding quota.

  • URL

    Chinese mainland:

    POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

    International:

    POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • Request headers

    Parameter

    Type

    Required

    Description

    Authorization

    string

    Supported

    Authentication token, formatted as Bearer <your_api_key>. Replace <your_api_key> with your actual API key.

    Content-Type

    string

    Supported

    Media type of data transmitted in the request body. Fixed value: application/json.

  • Request body

    The request body includes all parameters. Optional fields can be omitted.

    Important

    model: Voice design model. Fixed to qwen-voice-design. Do not change this value.

    {
        "model": "qwen-voice-design",
        "input": {
            "action": "delete",
            "voice": "yourVoice"
        }
    }
  • Request parameters

    Parameter

    Type

    Default

    Required

    Description

    model

    string

    -

    Supported

    Voice design model. Fixed value: qwen-voice-design.

    action

    string

    -

    Supported

    Action type. Fixed value: delete.

    voice

    string

    -

    Supported

    The voice to delete.

  • Response parameters

    Click to view a response example

    {
        "output": {
            "voice": "yourVoice"
        },
        "usage": {},
        "request_id": "yourRequestId"
    }

    Key parameters:

    Parameter

    Type

    Description

    request_id

    string

    The request ID.

    voice

    string

    The deleted voice.

  • Sample code

    Important

    model: Voice design model. Fixed to qwen-voice-design. Do not change this value.

    cURL

    If you have not set the API key as an environment variable, you must replace $DASHSCOPE_API_KEY in the example with your actual API key.

    # ======= Important =======
    # The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    # The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # === Delete this comment before you run the command ===
    
    curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-voice-design",
        "input": {
            "action": "delete",
            "voice": "yourVoice"
        }
    }'

    Python

    import requests
    import os
    
    def delete_voice(voice_name):
        """
        Deletes a specified voice.
        :param voice_name: The name of the voice.
        :return: True if the voice is deleted or does not exist but the request is successful. False if the operation fails.
        """
        # The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        # If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
        api_key = os.getenv("DASHSCOPE_API_KEY")
        
        # Prepare the request data.
        headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }
        
        data = {
            "model": "qwen-voice-design",
            "input": {
                "action": "delete",
                "voice": voice_name
            }
        }
        
        # The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
        # Send the request.
        response = requests.post(
            url,
            headers=headers,
            json=data
        )
        
        if response.status_code == 200:
            result = response.json()
            
            # Check for an error message.
            if "code" in result and "VoiceNotFound" in result["code"]:
                print(f"Voice does not exist: {voice_name}")
                print(f"Error message: {result.get('message', 'Voice not found')}")
                return True  # The operation is considered successful if the voice does not exist because the target is already gone.
            
            # Check if the deletion was successful.
            if "usage" in result:
                print(f"Voice deleted successfully: {voice_name}")
                print(f"Request ID: {result.get('request_id', 'N/A')}")
                return True
            else:
                print(f"The deletion operation returned an unexpected format: {result}")
                return False
        else:
            print(f"Failed to delete the voice. Status code: {response.status_code}")
            print(f"Response content: {response.text}")
            return False
    
    def main():
        # Example: Delete a voice.
        voice_name = "myvoice"  # Replace with the actual name of the voice that you want to delete.
        
        print(f"Deleting voice: {voice_name}")
        success = delete_voice(voice_name)
        
        if success:
            print(f"\nDeletion of voice '{voice_name}' is complete!")
        else:
            print(f"\nFailed to delete voice '{voice_name}'!")
    
    if __name__ == "__main__":
        main()

    Java

    import com.google.gson.JsonObject;
    import com.google.gson.JsonParser;
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.io.OutputStream;
    import java.net.HttpURLConnection;
    import java.net.URL;
    
    public class Main {
    
        public static void main(String[] args) {
            Main example = new Main();
            // Example: Delete a voice.
            String voiceName = "myvoice"; // Replace with the actual name of the voice that you want to delete.
            System.out.println("Deleting voice: " + voiceName);
            example.deleteVoice(voiceName);
        }
    
        public void deleteVoice(String voiceName) {
            // The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If you have not configured an environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
    
            // Create the JSON request body string.
            String jsonBody = "{\n" +
                    "    \"model\": \"qwen-voice-design\",\n" +
                    "    \"input\": {\n" +
                    "        \"action\": \"delete\",\n" +
                    "        \"voice\": \"" + voiceName + "\"\n" +
                    "    }\n" +
                    "}";
    
            HttpURLConnection connection = null;
            try {
                // The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
                URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
                connection = (HttpURLConnection) url.openConnection();
    
                // Set the request method and headers.
                connection.setRequestMethod("POST");
                connection.setRequestProperty("Authorization", "Bearer " + apiKey);
                connection.setRequestProperty("Content-Type", "application/json");
                connection.setDoOutput(true);
                connection.setDoInput(true);
    
                // Send the request body.
                try (OutputStream os = connection.getOutputStream()) {
                    byte[] input = jsonBody.getBytes("UTF-8");
                    os.write(input, 0, input.length);
                    os.flush();
                }
    
                // Get the response.
                int responseCode = connection.getResponseCode();
                if (responseCode == HttpURLConnection.HTTP_OK) {
                    // Read the response content.
                    StringBuilder response = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            response.append(responseLine.trim());
                        }
                    }
    
                    // Parse the JSON response.
                    JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
    
                    // Check for an error message.
                    if (jsonResponse.has("code") && jsonResponse.get("code").getAsString().contains("VoiceNotFound")) {
                        String errorMessage = jsonResponse.has("message") ?
                                jsonResponse.get("message").getAsString() : "Voice not found";
                        System.out.println("Voice does not exist: " + voiceName);
                        System.out.println("Error message: " + errorMessage);
                        // The operation is considered successful if the voice does not exist because the target is already gone.
                    } else if (jsonResponse.has("usage")) {
                        // Check if the deletion was successful.
                        System.out.println("Voice deleted successfully: " + voiceName);
                        String requestId = jsonResponse.has("request_id") ?
                                jsonResponse.get("request_id").getAsString() : "N/A";
                        System.out.println("Request ID: " + requestId);
                    } else {
                        System.out.println("The deletion operation returned an unexpected format: " + response.toString());
                    }
    
                } else {
                    // Read the error response.
                    StringBuilder errorResponse = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            errorResponse.append(responseLine.trim());
                        }
                    }
    
                    System.out.println("Failed to delete the voice. Status code: " + responseCode);
                    System.out.println("Error response: " + errorResponse.toString());
                }
    
            } catch (Exception e) {
                System.err.println("An error occurred during the request: " + e.getMessage());
                e.printStackTrace();
            } finally {
                if (connection != null) {
                    connection.disconnect();
                }
            }
        }
    }

Speech synthesis

To synthesize audio with a custom voice generated by voice design, see Getting started: From voice design to speech synthesis.

The speech synthesis model for voice design—such as qwen3-tts-vd-realtime-2026-01-15—is a dedicated model. It supports only voices generated by voice design. It does not support system voices such as Chelsie, Serena, Ethan, or Cherry.

Voice quota and automatic cleanup rules

  • Quota limit: 1,000 voices per account.

    You can check the count via the total_count field in the List voices
  • Automatic cleanup: Voices unused for over one year are automatically deleted.

Billing

Voice design and speech synthesis are billed separately.

  • Voice design: Create a voice style is billed at USD 0.2 per voice. Creation failures are not billed.

    Note

    Free quota details (available only in the and Singapore regions):

    • 10 free voice creations within 90 days after activating Alibaba Cloud Model Studio.

    • Failed creations do not consume free quota.

    • Deleting a voice does not restore free quota.

    • After the free quota is used up or the 90-day validity period expires, voice creation is billed at $0.2 per voice.

  • Speech synthesis using custom voices: Billed per character. For pricing details, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.

Error messages

If you encounter errors, see Error messages for troubleshooting.