Back to APIs

Unified Agent API

Text-to-Speech

Text-to-Speech

Convert text to natural speech with voice description in plain language, using DashScope as primary provider and ElevenLabs as fallback.

POST/v1/speech/synthesize

Overview

Your agent converts text to audio. DashScope (QWEN3 TTS) is the primary provider with natural language voice selection. ElevenLabs is the fallback with a pre-defined voice catalog. Returns base64-encoded audio.

Parameters

text

Required

string

The text to convert to speech.

voice_description

string

Natural language description of the desired voice (DashScope). Ignored if using ElevenLabs.

voice_id

string

ElevenLabs voice ID. If provided, forces ElevenLabs provider.

provider

select (default: auto)

Force a specific TTS provider. Default: auto (DashScope first, ElevenLabs fallback).

Auto (waterfall)
DashScope
ElevenLabs

Example Response

{
  "success": true,
  "data": {
    "audio_base64": "UklGRi...(base64 audio data)...",
    "format": "wav",
    "sample_rate": 24000,
    "text": "Hello, I am your AI assistant.",
    "voice_description": "A calm, professional female voice"
  },
  "metadata": {
    "provider_used": "dashscope",
    "providers_tried": ["dashscope"],
    "response_time_ms": 3200,
    "request_id": "req_tts_001"
  },
  "credits_used": 2
}

Get Started

Use this API through the O-mega platform. Create an API key in your dashboard, then call the endpoint with your key in the Authorization header.

Try Text-to-Speech

Test Text-to-Speech in the interactive playground. No setup required.

Open Playground
Text-to-Speech API | Unified Agent APIs | o-mega