Back to APIs

Unified Agent API

Speech-to-Text

Speech-to-Text

Transcribe pre-recorded audio to text via Deepgram with confidence scores, word-level timing, and multi-language support.

POST/v1/speech/transcribe

Overview

Your agent transcribes pre-recorded audio to text via Deepgram (nova-2 model). Provide a URL to the audio file or base64-encoded audio data. Returns transcript with confidence scores and word-level timing. For real-time streaming, use the WebSocket API.

Parameters

audio_url

string

URL to audio file. Deepgram fetches it directly.

audio_base64

string

Base64-encoded audio data. Use this for local files.

language

string (default: en)

Language code for transcription.

Example Response

{
  "success": true,
  "data": {
    "transcript": "Hello, this is a test recording.",
    "confidence": 0.98,
    "language": "en",
    "words": [
      {"word": "Hello", "start": 0.0, "end": 0.5, "confidence": 0.99}
    ],
    "duration_seconds": 3.2
  },
  "metadata": {
    "provider_used": "deepgram",
    "providers_tried": ["deepgram"],
    "response_time_ms": 1200,
    "request_id": "req_stt_001"
  },
  "credits_used": 2
}

Get Started

Use this API through the O-mega platform. Create an API key in your dashboard, then call the endpoint with your key in the Authorization header.

Try Speech-to-Text

Test Speech-to-Text in the interactive playground. No setup required.

Open Playground
Speech-to-Text API | Unified Agent APIs | o-mega