p-video-avatar

Generate talking head videos from a single image with text or audio-driven speech

Overview

p-video-avatar is Pruna's talking head video generation model. Provide a portrait image and either a speech script or an audio file, and the model generates a realistic video of the person speaking. Supports multiple voices, languages, and output resolutions.

Rate Limit: 50 requests per minute

Category: Video Generation

Pricing:

ConfigurationPrice
720p0.025$ per second of output video
1080p0.045$ per second of output video

Quickstart

Start with uploading your image

curl -X POST "https://api.pruna.ai/v1/files" \
  -H "apikey: YOUR_API_KEY" \
  -F "content=@/path/to/portrait.jpg"

Note: Use -F (form) with @ prefix to upload a file from your local filesystem. The file path should be absolute or relative to your current directory.

Avatar with Voice Script (Synchronous)
curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H 'apikey: YOUR_API_KEY' \
-H 'Model: p-video-avatar' \
-H 'Try-Sync: true' \
-d '{
"input": {
"image": "https://your-url.com/portrait.jpg",
"voice_script": "Hello, welcome to our product demo!",
"voice": "Zephyr (Female)"
}
}'
Avatar with Voice Script (Asynchronous)
curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H 'apikey: YOUR_API_KEY' \
-H 'Model: p-video-avatar' \
-d '{
"input": {
"image": "https://your-url.com/portrait.jpg",
"voice_script": "Hello, welcome to our product demo!",
"voice": "Zephyr (Female)",
"voice_language": "English (US)"
}
}'
Avatar with Audio
curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H 'apikey: YOUR_API_KEY' \
-H 'Model: p-video-avatar' \
-d '{
"input": {
"image": "https://your-url.com/portrait.jpg",
"audio": "https://your-url.com/speech.mp3"
}
}'

Parameters

Required Parameters
ParameterTypeDescription
imagestringInput image (first frame). Supports jpg, jpeg, png, webp

You must also provide either voice_script or audio (or both, in which case audio takes priority).

Optional Parameters
ParameterTypeDefaultDescription
seedintegerrandomRandom seed. Set for reproducible generation
audiostring-URL of uploaded audio to condition video generation. If both audio and voice_script are provided, uploaded audio is used
voicestring"Zephyr (Female)"Voice for generated speech. Options: "Zephyr (Female)", "Puck (Male)", "Charon (Male)", "Kore (Female)", "Fenrir (Male)", "Leda (Female)", "Orus (Male)", "Aoede (Female)", "Callirrhoe (Female)", "Autonoe (Female)", "Enceladus (Male)", "Iapetus (Male)", "Umbriel (Male)", "Algenib (Male)", "Despina (Female)", "Erinome (Female)", "Laomedeia (Female)", "Achernar (Female)", "Algieba (Male)", "Schedar (Male)", "Gacrux (Female)", "Pulcherrima (Female)", "Achird (Male)", "Zubenelgenubi (Male)", "Vindemiatrix (Female)", "Sadachbia (Male)", "Sadaltager (Male)", "Sulafat (Female)", "Alnilam (Male)", "Rasalgethi (Male)"
resolutionstring"720p"Resolution of the video. Options: "720p", "1080p"
video_promptstring"The person is talking."Optional prompt for the video
voice_promptstring"Say the following."Optional speaking style, tone, pacing or emotion instructions
voice_scriptstring""Script for the person to say when no audio is uploaded
voice_languagestring"English (US)"Output language. Options: "English (US)", "English (UK)", "Spanish", "French", "German", "Italian", "Portuguese (Brazil)", "Japanese", "Korean", "Hindi"
disable_safety_filterbooleantrueDisable safety filter for prompts and input image. When disabled, prompts are not checked for unsafe content before generation
disable_prompt_upsamplingbooleanfalseWhen true, skip the OpenRouter multimodal prompt upsampler and pass the raw user prompt to the video model