Talk-V2V (Lip-Sync) API

한국어로 보기: Talk-V2V (립싱크) API | View in English (current page)

The Talk-V2V API takes an existing video and a separate audio file, then drives the speaker's mouth and motion in the video to match the audio — producing a lip-synced video.

🎯 Service Overview

Supported Features

Video-to-Video lip sync: drive an input video with new audio
Resolution: 480p / 720p
Aspect handling: stretch / crop / pad to fit target aspect ratio

Typical Use Cases

K-pop idol localization (re-voice an existing performance video)
K-beauty product reviews with new narration
Multi-language video reuse from a single source clip

📡 API Endpoints

Basic Information

Base URL:       https://api.kvid.ai
Authentication: api-key header
Content-Type:   application/json

Talk-V2V is asynchronous — submit a job to receive a job_id, poll the unified status endpoint, then fetch the result.

Method	Path	Purpose
`POST`	`/ai/generation/talk-v2v/generate-async`	Submit a Talk-V2V job
`GET`	`/ai/generation/status?jobId={job_id}`	Check job status
`GET`	`/ai/generation/result?jobId={job_id}`	Fetch completed result

The api-key header identifies the user and their subscription. You don't need to include email or product_code in the request body or query string — the backend resolves both from the API key.

1. Submit a Talk-V2V job

import requests

url = "https://api.kvid.ai/ai/generation/talk-v2v/generate-async"
api_key = "YOUR_API_KEY"

payload = {
    "input_video": "https://your-host.example/source.mp4",
    "audio_file": "https://your-host.example/voice.mp3",
    "resolution": "720p",
    "image_size": { "width": 1280, "height": 720 },
    "keep_proportion": "crop",
    "frame_rate": 30,
    "audio_duration": 8.5
}
headers = {
    "api-key": api_key,
    "Content-Type": "application/json",
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())

Response

{
  "success": true,
  "data": {
    "job_id": "tlk_1777360165746_xyz789",
    "request_id": "req_abc",
    "status": "queued",
    "message": "Job submitted",
    "estimated_time": "60s",
    "credit_cost": 80
  }
}

2. Check job status

import requests

api_key = "YOUR_API_KEY"
job_id = "tlk_1777360165746_xyz789"

url = f"https://api.kvid.ai/ai/generation/status?jobId={job_id}"
headers = {"api-key": api_key}

response = requests.get(url, headers=headers)
print(response.json())

status is one of: queued, processing, completed, failed.

3. Fetch the completed result

import requests

api_key = "YOUR_API_KEY"
job_id = "tlk_1777360165746_xyz789"

url = f"https://api.kvid.ai/ai/generation/result?jobId={job_id}"
headers = {"api-key": api_key}

response = requests.get(url, headers=headers)
print(response.json())

Response

{
  "success": true,
  "data": {
    "job_id": "tlk_1777360165746_xyz789",
    "status": "completed",
    "result_url": "https://cdn.kvid.ai/videos/tlk_1777360165746_xyz789.mp4",
    "width": 1280,
    "height": 720,
    "type": "video/mp4",
    "used_credit": 80,
    "created_at": "2026-04-21T10:00:00Z"
  }
}

📋 Schema

Request fields

Field	Type	Required	Description
`input_video`	string (URL)	✅	HTTPS URL of the source video
`audio_file`	string (URL)	✅	HTTPS URL of the audio that should drive the lip sync
`prompt`	string	–	Optional text prompt to guide style
`negative_prompt`	string	–	Things to avoid
`model`	string	–	Model identifier
`function`	string	–	Function identifier
`resolution`	string	–	`480p` / `720p`
`image_size.width` / `image_size.height`	integer	–	Output dimensions (alternative to `resolution`)
`keep_proportion`	string	–	How to handle aspect mismatches: `stretch` / `crop` / `pad`
`audio_duration`	float	–	Audio length in seconds — used to bound the output
`frame_rate`	integer	–	Output frames per second
`max_frames`	integer	–	Hard cap on output frame count
`steps`	integer	–	Sampling steps (higher = better quality, slower)
`cfg_scale`	float	–	Classifier-free guidance strength
`crf`	integer	–	Output video CRF (lower = higher quality, larger file)
`seed`	integer	–	Reproducibility

The backend converts width / height shorthand into the image_size: { width, height } object automatically when sent through the SDK; in raw HTTP, send image_size directly.

💰 Pricing

Talk-V2V cost depends on output resolution and duration. See Pricing → Video Generation for the current rates.

⚠️ Limitations & Notes

Source video: best results when the speaker's face is clearly visible and roughly front-facing
Audio: clear, single-speaker audio works best
Duration: longer outputs cost proportionally more credits and take longer to render
Aspect: pick keep_proportion that matches your downstream use (crop for full-bleed, pad to preserve full frame)

Create an API key
Buy credits
Pricing
Video Generation API — text-to-video / image-to-video

📞 Support & Contact

Email: [email protected]
Discord: kvidAI Community

Language: English (current page) | 한국어

🎯 Service Overview​

Supported Features​

Typical Use Cases​

📡 API Endpoints​

Basic Information​

1. Submit a Talk-V2V job​

2. Check job status​

3. Fetch the completed result​

📋 Schema​

Request fields​

💰 Pricing​

⚠️ Limitations & Notes​

🔗 Related Links​

📞 Support & Contact​