본문으로 건너뛰기

Talk-V2V (Lip-Sync) API

한국어로 보기: Talk-V2V (립싱크) API | View in English (current page)

The Talk-V2V API takes an existing video and a separate audio file, then drives the speaker's mouth and motion in the video to match the audio — producing a lip-synced video.

🎯 Service Overview

Supported Features

  • Video-to-Video lip sync: drive an input video with new audio
  • Resolution: 480p / 720p
  • Aspect handling: stretch / crop / pad to fit target aspect ratio

Typical Use Cases

  • K-pop idol localization (re-voice an existing performance video)
  • K-beauty product reviews with new narration
  • Multi-language video reuse from a single source clip

📡 API Endpoints

Basic Information

Base URL:       https://api.kvid.ai
Authentication: api-key header
Content-Type: application/json

Talk-V2V is asynchronous — submit a job to receive a job_id, poll the unified status endpoint, then fetch the result.

MethodPathPurpose
POST/ai/generation/talk-v2v/generate-asyncSubmit a Talk-V2V job
GET/ai/generation/status?jobId={job_id}Check job status
GET/ai/generation/result?jobId={job_id}Fetch completed result

The api-key header identifies the user and their subscription. You don't need to include email or product_code in the request body or query string — the backend resolves both from the API key.

1. Submit a Talk-V2V job

import requests

url = "https://api.kvid.ai/ai/generation/talk-v2v/generate-async"
api_key = "YOUR_API_KEY"

payload = {
"input_video": "https://your-host.example/source.mp4",
"audio_file": "https://your-host.example/voice.mp3",
"resolution": "720p",
"image_size": { "width": 1280, "height": 720 },
"keep_proportion": "crop",
"frame_rate": 30,
"audio_duration": 8.5
}
headers = {
"api-key": api_key,
"Content-Type": "application/json",
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())

Response

{
"success": true,
"data": {
"job_id": "tlk_1777360165746_xyz789",
"request_id": "req_abc",
"status": "queued",
"message": "Job submitted",
"estimated_time": "60s",
"credit_cost": 80
}
}

2. Check job status

import requests

api_key = "YOUR_API_KEY"
job_id = "tlk_1777360165746_xyz789"

url = f"https://api.kvid.ai/ai/generation/status?jobId={job_id}"
headers = {"api-key": api_key}

response = requests.get(url, headers=headers)
print(response.json())

status is one of: queued, processing, completed, failed.

3. Fetch the completed result

import requests

api_key = "YOUR_API_KEY"
job_id = "tlk_1777360165746_xyz789"

url = f"https://api.kvid.ai/ai/generation/result?jobId={job_id}"
headers = {"api-key": api_key}

response = requests.get(url, headers=headers)
print(response.json())

Response

{
"success": true,
"data": {
"job_id": "tlk_1777360165746_xyz789",
"status": "completed",
"result_url": "https://cdn.kvid.ai/videos/tlk_1777360165746_xyz789.mp4",
"width": 1280,
"height": 720,
"type": "video/mp4",
"used_credit": 80,
"created_at": "2026-04-21T10:00:00Z"
}
}

📋 Schema

Request fields

FieldTypeRequiredDescription
input_videostring (URL)HTTPS URL of the source video
audio_filestring (URL)HTTPS URL of the audio that should drive the lip sync
promptstringOptional text prompt to guide style
negative_promptstringThings to avoid
modelstringModel identifier
functionstringFunction identifier
resolutionstring480p / 720p
image_size.width / image_size.heightintegerOutput dimensions (alternative to resolution)
keep_proportionstringHow to handle aspect mismatches: stretch / crop / pad
audio_durationfloatAudio length in seconds — used to bound the output
frame_rateintegerOutput frames per second
max_framesintegerHard cap on output frame count
stepsintegerSampling steps (higher = better quality, slower)
cfg_scalefloatClassifier-free guidance strength
crfintegerOutput video CRF (lower = higher quality, larger file)
seedintegerReproducibility

The backend converts width / height shorthand into the image_size: { width, height } object automatically when sent through the SDK; in raw HTTP, send image_size directly.

💰 Pricing

Talk-V2V cost depends on output resolution and duration. See Pricing → Video Generation for the current rates.

⚠️ Limitations & Notes

  • Source video: best results when the speaker's face is clearly visible and roughly front-facing
  • Audio: clear, single-speaker audio works best
  • Duration: longer outputs cost proportionally more credits and take longer to render
  • Aspect: pick keep_proportion that matches your downstream use (crop for full-bleed, pad to preserve full frame)

📞 Support & Contact


Language: English (current page) | 한국어