Home AI Tools AI Guides AI Models AI Creators 🛒 Buy Get Started
🎬 KLING AI ⏱ 4 min read 🗣️ Avatar v2

Avatar v2 — Technical Guide

Turn any portrait photo into a talking video — upload a photo and provide audio or type what they should say, and the AI animates the face with natural movement and lip sync

🗣️

Avatar v2

klingai video /app/avatar →
Turn any portrait photo into a talking video — upload a photo and provide audio or type what they should say, and the AI animates the face with natural movement and lip sync
Avatar v2 brings still photos to life. Upload a portrait of your character and provide audio — either upload a recording or type the words and let the AI generate the voice — and the result is a video where the person in the photo appears to speak naturally with realistic head movement, eye blinking, and perfectly synchronized lip movements.

This is different from Lip Sync, which requires an existing video. Avatar starts from a single still photograph. The AI adds all the movement — subtle head tilts, natural eye blinks, facial expression shifts, and precise mouth animation — creating a convincing talking-head video from nothing but a static image.

Two audio modes cover every use case. Upload Audio lets you use any pre-recorded speech, voiceover, podcast clip, or translated narration. Type Text mode lets you write the dialogue, choose a voice from the catalog, select a language and emotion (happy, sad, angry, surprised, and more), adjust the speaking speed, and the AI handles everything in one step.

An optional prompt lets you guide the mood and gesture style — describe the expression, energy level, or emotion and the AI adjusts the head movement and facial animation to match. The result is a complete talking-head video ready for social media, customer support responses, training materials, product announcements, or personalized video messages.
✦ Best Results Tips
👤 Front-Facing Portrait with Good Lighting
Use a well-lit photo where the face is clearly visible from the front. Centered head, eyes looking at the camera, neutral or slight smile. Avoid sunglasses, masks, or heavy shadows across the face.
🎭 Prompt Controls Mood, Not Speech
The prompt field controls expression and gesture style — not what the person says. Write things like confident and energetic or calm and thoughtful. The actual speech comes from the audio file or typed text.
⌨️ Type Text for the Fastest Results
Type Text mode generates the voice and syncs the lips in one step — no need to record or find an audio file. Pick a voice, set the emotion, write the words, and the AI does the rest.
😊 Choose the Right Emotion
When using Type Text mode, the emotion setting changes how the voice sounds and how the face moves. Happy adds warmth and slight smiles, angry adds intensity, sad adds softness. Match the emotion to the content.
⏱️ Keep Audio Under 60 Seconds
Shorter audio clips produce the highest quality animation. Under 60 seconds is ideal — the AI maintains natural movement consistency throughout. Longer clips can drift in expression quality.
📐 Head and Shoulders Framing
The best results come from photos framed from the upper chest up. Too much full body reduces face detail. Too tight a crop leaves no room for natural head movement during animation.

Avatar v2 — Available Models

Avatar Standard
Default
kling-v2-avatar
Natural lip-sync and expressive motion from portrait + audio.
Mode: std
Avatar Pro
kling-v2-avatar
Higher fidelity, smoother motion, improved expressivity.
Mode: pro
📥 You Give
🖼️Character Photo 🎤Audio (TTS or Upload) 🎭Expression Prompt
AI Magic
klingai
🎬 You Get
🎬 Video
Quality modes
Standard
Professional
TTS emotions
😐 Neutral 😊 Happy 😠 Angry 😢 Sad 😨 Fearful 🤢 Disgusted 😲 Surprised
⏱️
5 min
Max duration
🎤
Upload (MP3/WAV/M4A)
Audio source
🎤
TTS
Audio source
🌐
English, Chinese
TTS languages

💰 Avatar v2 — Pricing

Estimated cost
Failed jobs are automatically refunded
The Avatar 2.0 feature allows you to upload character images, add voiceovers, and describe the character’s expressions to generate lifelike dynamic avatar videos. The newly upgraded Avatar 2.0 dramatically enhances performance, offering full coverage for 5-minute-long content scenes!

Showcase Kling Avatar

Prompt Excited and joyful, the child raises her hands covered in paint, laughing and interacting with the colorful art supplies on the table, camera zooms in.
Input
Input
Output
Prompt Selfie of a young lady with a bright smile, her eyes sparkling with excitement as she sits in the driver's seat. Very Subtile handheld camera mouvement. No cars passing by. No distortions. Very natural mouvements
Input
Input
Output
Prompt With a joyful expression Santa laughs and interacts with the camera, gesturing with open hands wearing white gloves, exuding holiday cheer, surrounded by festive lights and decorations.
Input
Input
Output
Prompt While talking, they excitedly shook their heads and swayed their bodies. Finally, they clenched their fists and decided to set off, jumping and skipping happily.
Input
Input
Output
Prompt Put hands together in front of your chest, and finally hold them together and tell a story naturally.
Input
Input
Output
Prompt He raised his hand to touch his glasses and then angrily pointed at the camera with his finger.
Input
Input
Output
Prompt Patient and gentle explanations, occasionally glancing at the item in the hand, maintaining a smile, with natural movement.
Input
Input
Output
Prompt Professional explanations, natural movements, and sometimes use gestures to assist in the explanation.
Input
Input
Output
Prompt The singer sings earnestly, enjoying the stage with a smile, her body movements swaying naturally in coordination with the performance.
Input
Input
Output
Prompt The female singer sings to the audience while looking confident, occasionally smiling at the camera, hand on the microphone, natural arm movements.
Input
Input
Output
Prompt In a commercial advertisement, a person holds a product in one hand and speaks directly to the camera. The gesture is deliberate and confident.
Input
Input
Output
Prompt The expression is intoxicated, emotions high, gently shaking the head. The snake around the neck moves as light reflects off its body, gradually zooming in on the face.
Input
Input
Output
Prompt Smiling, swaying confidently while rapping, holding a microphone. Eyes focused on the audience, natural and fluid movements. Occasional head movements.
Input
Input
Output
Prompt Confidently posing with a sultry gaze, the figure exudes an aura of mystery and allure, captivating the audience with every movement.
Input
Input
Output
Prompt A teacher is speaking politely and earnestly.
Input
Input
Output
Prompt Confidently holding a smartphone, standing in an empty street, exuding a mysterious aura with a slight smile.
Input
Input
Output
Prompt The man is angry, shown in both facial expression and action.
Input
Input
Output
Prompt Smiling warmly at the camera, she gently touches her necklace, exuding confidence and grace.
Input
Input
Output

🗣️ Avatar v2

Try Avatar v2