p-video-avatar
Generate talking head videos from a single image with text or audio-driven speech
Generate talking head videos from a single image with text or audio-driven speech
p-video-avatar is Pruna's talking head video generation model. Provide a portrait image and either a speech script or an audio file, and the model generates a realistic video of the person speaking. Supports multiple voices, languages, and output resolutions.
Rate Limit: 50 requests per minute
Category: Video Generation
Pricing:
| Configuration | Price |
|---|---|
| 720p | 0.025$ per second of output video |
| 1080p | 0.045$ per second of output video |
curl -X POST "https://api.pruna.ai/v1/files" \
-H "apikey: YOUR_API_KEY" \
-F "content=@/path/to/portrait.jpg"
Note: Use -F (form) with @ prefix to upload a file from your local filesystem. The file path should be absolute or relative to your current directory.
curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H 'apikey: YOUR_API_KEY' \
-H 'Model: p-video-avatar' \
-H 'Try-Sync: true' \
-d '{
"input": {
"image": "https://your-url.com/portrait.jpg",
"voice_script": "Hello, welcome to our product demo!",
"voice": "Zephyr (Female)"
}
}'
curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H 'apikey: YOUR_API_KEY' \
-H 'Model: p-video-avatar' \
-d '{
"input": {
"image": "https://your-url.com/portrait.jpg",
"voice_script": "Hello, welcome to our product demo!",
"voice": "Zephyr (Female)",
"voice_language": "English (US)"
}
}'
curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H 'apikey: YOUR_API_KEY' \
-H 'Model: p-video-avatar' \
-d '{
"input": {
"image": "https://your-url.com/portrait.jpg",
"audio": "https://your-url.com/speech.mp3"
}
}'
| Parameter | Type | Description |
|---|---|---|
| image | string | Input image (first frame). Supports jpg, jpeg, png, webp |
You must also provide either voice_script or audio (or both, in which case audio takes priority).
| Parameter | Type | Default | Description |
|---|---|---|---|
| seed | integer | random | Random seed. Set for reproducible generation |
| audio | string | - | URL of uploaded audio to condition video generation. If both audio and voice_script are provided, uploaded audio is used |
| voice | string | "Zephyr (Female)" | Voice for generated speech. Options: "Zephyr (Female)", "Puck (Male)", "Charon (Male)", "Kore (Female)", "Fenrir (Male)", "Leda (Female)", "Orus (Male)", "Aoede (Female)", "Callirrhoe (Female)", "Autonoe (Female)", "Enceladus (Male)", "Iapetus (Male)", "Umbriel (Male)", "Algenib (Male)", "Despina (Female)", "Erinome (Female)", "Laomedeia (Female)", "Achernar (Female)", "Algieba (Male)", "Schedar (Male)", "Gacrux (Female)", "Pulcherrima (Female)", "Achird (Male)", "Zubenelgenubi (Male)", "Vindemiatrix (Female)", "Sadachbia (Male)", "Sadaltager (Male)", "Sulafat (Female)", "Alnilam (Male)", "Rasalgethi (Male)" |
| resolution | string | "720p" | Resolution of the video. Options: "720p", "1080p" |
| video_prompt | string | "The person is talking." | Optional prompt for the video |
| voice_prompt | string | "Say the following." | Optional speaking style, tone, pacing or emotion instructions |
| voice_script | string | "" | Script for the person to say when no audio is uploaded |
| voice_language | string | "English (US)" | Output language. Options: "English (US)", "English (UK)", "Spanish", "French", "German", "Italian", "Portuguese (Brazil)", "Japanese", "Korean", "Hindi" |
| disable_safety_filter | boolean | true | Disable safety filter for prompts and input image. When disabled, prompts are not checked for unsafe content before generation |
| disable_prompt_upsampling | boolean | false | When true, skip the OpenRouter multimodal prompt upsampler and pass the raw user prompt to the video model |