Home AI Tools AI Guides AI Models AI Creators 🛒 Buy Get Started
🎙️ ElevenLabs ⏱ 6 min read 🗣️ Text to Speech

Text to Speech — Technical Guide

Type any text and hear it spoken in a natural AI voice — choose from thousands of voices in 30+ languages, create multi-voice dialogues, and control emotion, speed, and delivery style

🗣️

Text to Speech

elevenlabs audio /app/elevenlabs-tts →
Type any text and hear it spoken in a natural AI voice — choose from thousands of voices in 30+ languages, create multi-voice dialogues, and control emotion, speed, and delivery style
Text to Speech turns written words into natural-sounding audio. Type what you want said, pick a voice from a library of thousands, and the AI generates speech that sounds like a real person — with natural rhythm, pauses, and expression. Supports over 30 languages.

Four modes cover different needs. Create Speech generates audio from text with a single voice — the simplest and most common use. Speech with Timing adds character-level timestamps to the output, useful for syncing audio with subtitles or animations. Create Dialogue lets you assign different voices to different lines, producing a multi-voice conversation with up to 10 unique speakers. Dialogue with Timestamps combines multi-voice with timing data for precise sync workflows.

Emotion and delivery control make the speech feel human. On the latest v3 model, audio tags let you insert direction directly into the text — mark a word as whispered, excited, or sighed, and the voice responds naturally. Speed and stability sliders fine-tune how fast the voice speaks and how consistent it stays.

The generated audio works standalone for podcasts, voiceovers, and narration, or feeds directly into other tools — use it as the audio input for Avatar (photo to talking video) or Lip Sync (make someone in a video speak it). This is how you give your AI character a voice across all their content.
✦ Best Results Tips
🎧 Preview Voices Before Generating
Browse the voice library and listen to previews before committing. Different voices excel at different content — some sound warm and conversational, others sound authoritative and professional. Find the one that matches your character.
✍️ Use Punctuation for Natural Pauses
Commas create short pauses, periods create longer ones, ellipsis creates a trailing hesitation. Write the text the way you want it spoken — punctuation is the easiest way to control rhythm and pacing.
🎭 Audio Tags for Emotion (v3 Only)
On the v3 model, insert tags like [excited], [whispers], [sigh] directly in your text to change the delivery mid-sentence. Click any tag pill on the page to insert it at your cursor position.
💬 Dialogue Mode for Conversations
Use Create Dialogue when you need multiple voices — each line gets its own voice assignment. Up to 10 unique voices per generation. Perfect for podcast-style content, interviews, or character interactions.
Flash for Speed, Multilingual for Quality
Flash and Turbo models generate faster and cost less — great for drafts and testing. Multilingual v2 and v3 produce the most natural, expressive speech — use them for final content you plan to publish.
🔗 Feed Audio into Avatar or Lip Sync
Generate speech here, then use the audio file as input for Avatar (turn a photo into a talking video) or Lip Sync (make someone in an existing video speak it). This is the voice pipeline for your AI character.

Text to Speech — Available Models

Multilingual v2
Default Default
eleven_multilingual_v2
29 languages, best quality for non-English. Default for dubbing.
29 languages
v3 — Latest
Latest
eleven_v3
74 languages, newest model.
74 languages
Flash v2.5
Fast
eleven_flash_v2_5
Ultra-fast, cost-efficient. 32 languages.
32 languages
Turbo v2.5
eleven_turbo_v2_5
Low-latency streaming. 32 languages.
32 languages
📥 You Give
📝Text to Speak 🎙️Voice Selection 🎭Emotion (optional) 🌍Language
AI Magic
elevenlabs
🎵 You Get
🎵 Audio
Modes
Speech
Speech + Timing
Dialogue
Dialogue + Timing
Output formats
MP3 WAV PCM OPUS
🌍
74 languages
Model maximum
📝
5,000 chars
Max text per request
🗣️
10 inputs
10 voices
Speed 0.5-2x
Playback rate
🎯
Stability 0-1
Voice consistency

💰 Text to Speech — Pricing

Estimated cost
Failed jobs are automatically refunded

🗣️ Text to Speech

Try Text to Speech