🎙️ Voxtral ⏱ 4 min read 🎙️ Voxtral Text to Speech

Voxtral Text to Speech — Technical Guide

Generate natural multilingual speech with native French accent support and voice cloning

🎙️

Voxtral Text to Speech

mistral audio /app/voxtral-tts →

Generate natural multilingual speech with native French accent support and voice cloning

Voxtral Text to Speech converts written text into natural-sounding speech using Mistral AI. Type or paste up to 10,000 characters, select a voice, and the AI generates audio that sounds like a real person speaking — with natural rhythm, intonation, and breathing.

Three voice modes give you full flexibility. Preset Voices are professionally curated voices available to everyone — browse and preview them before choosing. My Voices shows your personally cloned voices, including character-specific voices you have created with Voice Clone. Upload mode lets you do instant zero-shot cloning — drop in a 2 to 60 second audio reference and the AI mimics that voice for this generation without permanently saving it.

Link a character to automatically see their dedicated voices first. The tool detects the language from your text automatically, with native-quality support for French, English, Spanish, German, Portuguese, Italian, Dutch, Hindi, and Arabic. French accent quality is particularly strong — Mistral is a French AI lab.

Output formats include MP3, WAV, FLAC, and Opus. A built-in cost estimator shows exactly how many credits the generation will cost before you submit, based on character count. Results save directly to your gallery and can be used as audio input for Avatar, Lip Sync, or the content pipeline dubbing workflow.

This is the voice engine for giving your AI character a consistent, recognizable voice across all their content — social posts, videos, podcasts, and dubbed translations.

✦ Best Results Tips

🎧 Preview Preset Voices First

Listen to each preset voice before generating. Different voices suit different content — some sound warm and conversational, others sound professional and clear. Find the one that matches your character personality.

✍️ Punctuation Controls Pacing

Commas create short pauses, periods create longer ones, ellipsis creates a trailing hesitation. Write the text exactly how you want it spoken — punctuation is your primary tool for controlling rhythm and delivery.

🎤 Clone Your Character Voice

Use Voice Clone to create a permanent voice from a 2–60 second audio sample, then select it here under My Voices. Once cloned, your character speaks with the same voice every time — across all tools and languages.

⚡ Upload Mode for Quick Tests

Upload mode lets you test a voice reference without permanently cloning it. Drop in any audio clip and generate speech instantly. If you like the result, go to Voice Clone to save that voice permanently.

💰 Check the Cost Estimator

The cost estimator updates in real time as you type. Longer text costs more — if you are testing a prompt, try a short excerpt first to verify the voice sounds right before generating the full text.

🔗 Feed Audio into Other Tools

Generated speech works as direct input for Avatar (photo to talking video), Lip Sync (make someone in a video speak), and the content pipeline dubbing system. This is the first step in the voice pipeline.

Voxtral TTS — Available Models

Voxtral Mini TTS

MINI Default

voxtral-mini-tts-2603

Fast, high-quality TTS. Beats ElevenLabs Flash v2.5 in human evals. Native French.

Mode: tts

💰 Voxtral TTS — Pricing

Estimated cost

—

Failed jobs are automatically refunded