A hands-on guide to turning a still image into video via API — source-image prep, first/last-frame control, motion prompting, and the polling and callback patterns that hold up in production.

You have a still image — a product render, a signed-off key visual, a character sheet — and you need it to move. Not a new scene that looks roughly similar, but that exact frame, animated. That is what image-to-video (i2v) does, and it's a different integration from typing a prompt and hoping the model draws what you pictured. This is the hands-on version: prep the source image, control where the motion starts and ends, write a prompt that animates rather than re-describes, and handle the polling and callbacks that keep a multi-minute render from breaking your backend.
Still deciding whether to use i2v versus text-to-video? Start with Text to Video vs Image to Video API Workflow. This article assumes you've made that call and want to do i2v well.
Reach for i2v when the opening frame is non-negotiable. The model isn't inventing a subject; it takes yours and adds motion, so anything that has to stay on-brand or on-model is safe in a way text-to-video can't guarantee — product shots of the actual SKU, an approved hero turned into a header loop, a character sheet that keeps the face consistent, or a before→after where you pin a start and end image and let the model fill the transition. If the subject is flexible, text-to-video is usually less work. The whole point of i2v is that the still is fixed.
i2v is crowded and the bar is high. Worth knowing as context, even though the callable workflow below runs on a focused set:
The pattern is the same across the strong models: first-frame control is table stakes, last-frame control is no longer exotic, and anchoring both ends is the most reliable way to stop a clip drifting by the final second. That principle is what the workflow below is built around. On HiAPI the two i2v models you call are Seedance 2.0 (ByteDance) and Wan 2.7 image-to-video (Alibaba).
This is the step people skip and then blame the model for. With i2v, your first frame is the ceiling on quality. Three things to get right before you send anything:
adaptive ratio that matches the source dimensions automatically.Then host it. The API takes the source as a URL (first_frame_url), not a file upload, so it has to be reachable over HTTPS — your bucket, a CDN, signed object storage. A localhost path or a link the API can't fetch will fail the task.
On HiAPI every model, video included, runs through one async task API. You create a task, you get a taskId back immediately, and the render happens server-side. Turning text-to-video into image-to-video is one field: add first_frame_url. Here is the minimal call, using Seedance 2.0:
curl -X POST https://api.hiapi.ai/v1/tasks \
-H "Authorization: Bearer sk-<your-key>" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0",
"input": {
"prompt": "Slow cinematic push-in, a soft light sweep travels across the surface, the subject holds steady and in focus",
"first_frame_url": "https://cdn.your-domain.com/assets/product-front.jpg",
"aspect_ratio": "16:9",
"duration": 5,
"resolution": "720p",
"generate_audio": false
}
}'
That is the whole image-to-video request. The input fields that matter:
first_frame_url — the still to animate from; its presence is what makes this i2v.aspect_ratio — match the source (or use adaptive).duration — seconds of output; longer costs more and gives motion more room to drift.resolution — 480p / 720p / 1080p; the main cost dial.generate_audio — turn off for silent loops and product shots.Full request and response shapes are in the docs. Wan 2.7 image-to-video follows the same create-poll-download flow, but its input differs — it carries the source frame in a media array and outputs 720p or 1080p — so check the docs for its exact shape instead of reusing the Seedance body verbatim.
A first frame controls where the clip starts and lets the model decide how it ends — which is exactly where drift creeps in, because the model is improvising the back half. If continuity through the final frame matters, pin it too:
{
"model": "seedance-2-0",
"input": {
"prompt": "The product rotates a quarter turn and settles, light resolving to a clean studio key",
"first_frame_url": "https://cdn.your-domain.com/assets/product-front.jpg",
"last_frame_url": "https://cdn.your-domain.com/assets/product-three-quarter.jpg",
"aspect_ratio": "16:9",
"duration": 5,
"resolution": "720p"
}
}
Now both endpoints are fixed and the model only fills the motion between them. This is the workhorse pattern for product reveals and before→after transitions, and the single most effective fix for "the clip looked great until the last second." Keep both frames on the same subject and framing — if the start and end images disagree wildly, the model has to invent a path between them and you are back to guessing.
The prompt does a different job in i2v, and getting it wrong wastes renders. In text-to-video the prompt builds the whole scene. In image-to-video the model can already see the scene — it's in the frame you gave it — so the prompt's job is the change over time, not the contents:
A good i2v prompt reads like stage directions for a camera operator, not a scene description for a painter.
Video is GPU-minutes, not milliseconds. The render is asynchronous on purpose so it never holds a connection open. After you create the task, you have two ways to know it finished.
Poll until it reaches a terminal state — fine for scripts and prototypes:
curl https://api.hiapi.ai/v1/tasks/<taskId> \
-H "Authorization: Bearer sk-<your-key>"
# data.status: queued -> handling -> archiving -> success (or fail)
# on success, read data.output[0].url and download it promptly — the URL expires
Take a callback for anything past a prototype. Instead of sitting in a poll loop, pass a callback and let HiAPI push you the terminal state:
{
"model": "seedance-2-0",
"input": { "prompt": "...", "first_frame_url": "https://cdn.your-domain.com/assets/product-front.jpg", "aspect_ratio": "16:9", "duration": 5, "resolution": "720p" },
"callback": { "url": "https://your-domain.com/hooks/hiapi", "when": "final" }
}
Either way, the output URL is time-limited. The moment a task is success, download data.output[0].url to your own storage — do not hotlink it into a page or hand it to a customer, because it will expire.
A few habits separate a demo script from something you run at volume:
How is image-to-video different from text-to-video in the request?
One field. The task call is otherwise identical; adding first_frame_url (the source image) is what switches it from text-to-video to image-to-video. On HiAPI both run through the same POST /v1/tasks endpoint.
Can I control both the first and last frame?
Yes. first_frame_url pins the opening frame; adding last_frame_url pins the closing frame so the model only fills the motion between two fixed endpoints. Pinning both is the most reliable way to control a transition and prevent end-of-clip drift.
Does the Runway Gen-3/Gen-4 API do image-to-video? Runway's Gen-3 and Gen-4 lines support image-to-video and are strong on cinematic control — that is the broader landscape. The callable workflow in this guide runs on Seedance 2.0 and Wan 2.7 image-to-video, which HiAPI hosts behind one task API.
How long does an image-to-video render take? Minutes, not seconds — video is real compute. That is why the API is asynchronous: you create a task and either poll or receive a callback, so a long render never blocks your request.
Image-to-video is for when a specific still has to appear on screen as drawn and then move. The quality ceiling is set before you write a word of prompt — by the source image's resolution, aspect ratio, and framing — so prep it first and host it somewhere stable. Pin the first frame to control where the clip starts, add a last frame when the ending has to land, and write the prompt as motion and camera direction, not a re-description of a frame the model can already see. Then treat it as the async job it is: poll or, better, take a callback; make submission idempotent; download before the URL expires; budget on usable clips at ship resolution.
When you're ready, run a source image through the Seedance 2.0 model page or Wan 2.7 image-to-video, then wire the task API above, and check live per-second rates on the pricing page before you scale up. (Chasing a no-key option first? See free text-to-video options.)
Key Takeaways