video | Kling AI

🗣️Avatar v2

Turn any portrait photo into a talking video — upload a photo and provide audio or type what they should say, and the AI animates the face with natural movement and lip sync

video Avatar Standard Avatar Pro Kling AI

Start Creating → Guides ↓

Avatar v2 brings still photos to life. Upload a portrait of your character and provide audio — either upload a recording or type the words and let the AI generate the voice — and the result is a video where the person in the photo appears to speak naturally with realistic head movement, eye blinking, and perfectly synchronized lip movements.

This is different from Lip Sync, which requires an existing video. Avatar starts from a single still photograph. The AI adds all the movement — subtle head tilts, natural eye blinks, facial expression shifts, and precise mouth animation — creating a convincing talking-head video from nothing but a static image.

Two audio modes cover every use case. Upload Audio lets you use any pre-recorded speech, voiceover, podcast clip, or translated narration. Type Text mode lets you write the dialogue, choose a voice from the catalog, select a language and emotion (happy, sad, angry, surprised, and more), adjust the speaking speed, and the AI handles everything in one step.

An optional prompt lets you guide the mood and gesture style — describe the expression, energy level, or emotion and the AI adjusts the head movement and facial animation to match. The result is a complete talking-head video ready for social media, customer support responses, training materials, product announcements, or personalized video messages.

Available Models

Avatar Standard Std

kling-v2-avatar

Natural lip-sync and expressive motion from portrait + audio.

Avatar Pro Pro

kling-v2-avatar

Higher fidelity, smoother motion, improved expressivity.

Best results

👤

Front-Facing Portrait with Good Lighting

Use a well-lit photo where the face is clearly visible from the front. Centered head, eyes looking at the camera, neutral or slight smile. Avoid sunglasses, masks, or heavy shadows across the face.

🎭

Prompt Controls Mood, Not Speech

The prompt field controls expression and gesture style — not what the person says. Write things like confident and energetic or calm and thoughtful. The actual speech comes from the audio file or typed text.

⌨️

Type Text for the Fastest Results

Type Text mode generates the voice and syncs the lips in one step — no need to record or find an audio file. Pick a voice, set the emotion, write the words, and the AI does the rest.

😊

Choose the Right Emotion

When using Type Text mode, the emotion setting changes how the voice sounds and how the face moves. Happy adds warmth and slight smiles, angry adds intensity, sad adds softness. Match the emotion to the content.

⏱️

Keep Audio Under 60 Seconds

Shorter audio clips produce the highest quality animation. Under 60 seconds is ideal — the AI maintains natural movement consistency throughout. Longer clips can drift in expression quality.

📐

Head and Shoulders Framing

The best results come from photos framed from the upper chest up. Too much full body reduces face detail. Too tight a crop leaves no room for natural head movement during animation.

Guides

Avatar v2

🎬 Video

🎬 KLING AI 4 min read

Avatar v2 — Technical Guide

Turn any portrait photo into a talking video — upload a photo and provide audio or type what they should say, and the AI animates the face with natural movement and lip sync

🗣️

Try Avatar v2

No subscription required. Pay only for what you create.

Start Creating →