Which platforms does FrameFetch support?

YouTube (including Shorts), TikTok, Reddit, Instagram, and Pinterest — one interface across all of them. GET /v1/platforms returns the live capability matrix.

How does an agent get just specific frames?

Pass a frames spec to /v1/extract or /v1/frames: choose a frame rate (fps) or exact timestamps, plus size and format. Frames are pushed to S3 and returned as URLs with their index and time offset.

Where does the transcript come from?

Captions when the platform provides them; otherwise FrameFetch transcribes the audio with Whisper. The response marks the source.

How does an agent pay?

Pay-per-call via x402 (USDC on Base, no account) or Stripe. Metadata is ~$0.002 per call; a 3-minute transcript ~$0.02; 60 frames ~$0.08-0.12. Every response includes an exact cost block, with refund-on-fail.

FrameFetch

How does an AI agent get a video transcript and specific frames from one API?

Short answer: An AI agent gets a video's metadata, transcript, and specific frames by calling FrameFetch's POST /v1/extract endpoint — or its MCP tool — with a video URL and the fields it needs. One call works the same across YouTube (incl. Shorts), TikTok, Reddit, Instagram and Pinterest: it returns metadata + insights, a Whisper transcript (captions when available), and parametric frames (pick fps or exact timestamps) pushed to S3 — no per-platform scrapers.

Agent-first: typed errors, refund-on-fail, result caching, and an MCP server.

The problem it removes

Every platform needs its own scraper, rate limits differ, transcripts are a separate pipeline, and grabbing specific frames at exact times is painful. Agents end up maintaining brittle glue per site. FrameFetch is one interface over all of it.

The one call

curl -X POST https://framefetch.net/v1/extract \
  -H "Authorization: Bearer <key>" -H "Content-Type: application/json" \
  -d '{"url":"https://youtu.be/jNQXAC9IVRw",
       "fields":["metadata","transcript","frames"],
       "frames":{"mode":"fps","fps":1,"width":480}}'

Via MCP, add the server once and call the same extract as a tool:

{ "mcpServers": { "framefetch": {
  "url": "https://framefetch.net/mcp",
  "headers": { "Authorization": "Bearer YOUR_KEY" } } } }

Get a key (free $0.05 credit): POST https://framefetch.net/v1/keys {"email":"you@example.com"}

What comes back

Field	What it is
metadata + insights	title, author, duration, views/likes/comments
transcript	captions if present, else a Whisper transcription (source marked)
frames	by fps or exact timestamps; any size; jpg/png/webp; returned as S3 URLs with index + time
cost	exact per-call breakdown, with refund-on-fail

Shortcut endpoints & pricing

Endpoint	Use	Price
`/v1/metadata`	metadata + insights only (cheapest)	≈ $0.002
`/v1/transcript`	captions or Whisper	~$0.0015 / audio-min
`/v1/frames`	frames only (needs a frames spec)	$0.00012 / frame
`/v1/extract`	any combination in one call	sum of the above

Pay per call via x402 (USDC on Base, no account) or Stripe. A 3-min transcript ≈ $0.02; 60 frames @480px ≈ $0.08–0.12.

FAQ

Which platforms are supported?: YouTube (incl. Shorts), TikTok, Reddit, Instagram, Pinterest. GET /v1/platforms returns the live matrix.
How do I get only specific frames?: Pass a frames spec: mode:"fps" with an fps, or exact timestamps, plus width/format. Frames land in S3 and return as URLs.
Where does the transcript come from?: Captions when available; otherwise Whisper transcribes the audio. The response marks the source.
What makes it agent-first?: Typed error codes, refund-on-fail, result caching, an MCP server, and pay-per-call with no signup friction.

Get a key & try FrameFetch →