audio | ElevenLabs TTS

🗣️Text to Speech

Type any text and hear it spoken in a natural AI voice — choose from thousands of voices in 30+ languages, create multi-voice dialogues, and control emotion, speed, and delivery style

audio Multilingual v2 Default v3 — Latest Latest Flash v2.5 Fast Turbo v2.5 ElevenLabs TTS

Start Creating → Guides ↓

Text to Speech turns written words into natural-sounding audio. Type what you want said, pick a voice from a library of thousands, and the AI generates speech that sounds like a real person — with natural rhythm, pauses, and expression. Supports over 30 languages.

Four modes cover different needs. Create Speech generates audio from text with a single voice — the simplest and most common use. Speech with Timing adds character-level timestamps to the output, useful for syncing audio with subtitles or animations. Create Dialogue lets you assign different voices to different lines, producing a multi-voice conversation with up to 10 unique speakers. Dialogue with Timestamps combines multi-voice with timing data for precise sync workflows.

Emotion and delivery control make the speech feel human. On the latest v3 model, audio tags let you insert direction directly into the text — mark a word as whispered, excited, or sighed, and the voice responds naturally. Speed and stability sliders fine-tune how fast the voice speaks and how consistent it stays.

The generated audio works standalone for podcasts, voiceovers, and narration, or feeds directly into other tools — use it as the audio input for Avatar (photo to talking video) or Lip Sync (make someone in a video speak it). This is how you give your AI character a voice across all their content.

Available Models

Multilingual v2 Default

eleven_multilingual_v2

29 languages, best quality for non-English. Default for dubbing.

v3 — Latest Latest

eleven_v3

74 languages, newest model.

Flash v2.5 Fast

eleven_flash_v2_5

Ultra-fast, cost-efficient. 32 languages.

Turbo v2.5

eleven_turbo_v2_5

Low-latency streaming. 32 languages.

Best results

🎧

Preview Voices Before Generating

Browse the voice library and listen to previews before committing. Different voices excel at different content — some sound warm and conversational, others sound authoritative and professional. Find the one that matches your character.

✍️

Use Punctuation for Natural Pauses

Commas create short pauses, periods create longer ones, ellipsis creates a trailing hesitation. Write the text the way you want it spoken — punctuation is the easiest way to control rhythm and pacing.

🎭

Audio Tags for Emotion (v3 Only)

On the v3 model, insert tags like [excited], [whispers], [sigh] directly in your text to change the delivery mid-sentence. Click any tag pill on the page to insert it at your cursor position.

💬

Dialogue Mode for Conversations

Use Create Dialogue when you need multiple voices — each line gets its own voice assignment. Up to 10 unique voices per generation. Perfect for podcast-style content, interviews, or character interactions.

⚡

Flash for Speed, Multilingual for Quality

Flash and Turbo models generate faster and cost less — great for drafts and testing. Multilingual v2 and v3 produce the most natural, expressive speech — use them for final content you plan to publish.

🔗

Feed Audio into Avatar or Lip Sync

Generate speech here, then use the audio file as input for Avatar (turn a photo into a talking video) or Lip Sync (make someone in an existing video speak it). This is the voice pipeline for your AI character.

Guides

Text to Speech

🎵 Audio

🎙️ ElevenLabs 6 min read

Text to Speech — Technical Guide

Type any text and hear it spoken in a natural AI voice — choose from thousands of voices in 30+ languages, create multi-voice dialogues, and control emotion, speed, and delivery style

🗣️

Try Text to Speech

No subscription required. Pay only for what you create.

Start Creating →