POST /v1/extract endpoint — or its MCP tool — with a video URL and the fields it needs. One call works the same across YouTube (incl. Shorts), TikTok, Reddit, Instagram and Pinterest: it returns metadata + insights, a Whisper transcript (captions when available), and parametric frames (pick fps or exact timestamps) pushed to S3 — no per-platform scrapers.Agent-first: typed errors, refund-on-fail, result caching, and an MCP server.
Every platform needs its own scraper, rate limits differ, transcripts are a separate pipeline, and grabbing specific frames at exact times is painful. Agents end up maintaining brittle glue per site. FrameFetch is one interface over all of it.
curl -X POST https://framefetch.net/v1/extract \
-H "Authorization: Bearer <key>" -H "Content-Type: application/json" \
-d '{"url":"https://youtu.be/jNQXAC9IVRw",
"fields":["metadata","transcript","frames"],
"frames":{"mode":"fps","fps":1,"width":480}}'
Via MCP, add the server once and call the same extract as a tool:
{ "mcpServers": { "framefetch": {
"url": "https://framefetch.net/mcp",
"headers": { "Authorization": "Bearer YOUR_KEY" } } } }
Get a key (free $0.05 credit): POST https://framefetch.net/v1/keys {"email":"you@example.com"}
| Field | What it is |
|---|---|
| metadata + insights | title, author, duration, views/likes/comments |
| transcript | captions if present, else a Whisper transcription (source marked) |
| frames | by fps or exact timestamps; any size; jpg/png/webp; returned as S3 URLs with index + time |
| cost | exact per-call breakdown, with refund-on-fail |
| Endpoint | Use | Price |
|---|---|---|
/v1/metadata | metadata + insights only (cheapest) | ≈ $0.002 |
/v1/transcript | captions or Whisper | ~$0.0015 / audio-min |
/v1/frames | frames only (needs a frames spec) | $0.00012 / frame |
/v1/extract | any combination in one call | sum of the above |
Pay per call via x402 (USDC on Base, no account) or Stripe. A 3-min transcript ≈ $0.02; 60 frames @480px ≈ $0.08–0.12.
GET /v1/platforms returns the live matrix.mode:"fps" with an fps, or exact timestamps, plus width/format. Frames land in S3 and return as URLs.