Video summarizer with Vercel AI SDK and Ittybit
Use Ittybit to extract the audio track from a video, then pipe it through the Vercel AI SDK to transcribe and summarize the content in a streaming response. The user sees the summary appear token-by-token while the model is still generating.
Install dependencies
npm install ai @ai-sdk/openai
Extract audio with Ittybit
Create a Server Action that POSTs an audio task to Ittybit, then polls until the extracted audio file is ready.
// app/actions.ts
'use server';
async function extractAudio(videoUrl: string): Promise<string> {
const res = await fetch('https://api.ittybit.com/jobs', {
method: 'POST',
headers: {
Authorization: `Bearer ${process.env.ITTYBIT_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
input: videoUrl,
kind: 'audio',
options: { format: 'mp3' },
}),
});
const task = await res.json();
return pollTask(task.id);
}
async function pollTask(taskId: string): Promise<string> {
while (true) {
const res = await fetch(`https://api.ittybit.com/jobs/${taskId}`, {
headers: {
Authorization: `Bearer ${process.env.ITTYBIT_API_KEY}`,
},
});
const task = await res.json();
if (task.status === 'failed') {
throw new Error(`Task ${taskId} failed`);
}
if (task.status === 'succeeded') {
return task.output.url;
}
await new Promise((r) => setTimeout(r, 2000));
}
}
Stream the summary with the AI SDK
Create an API route that takes a video URL, extracts the audio, and uses streamText to generate a summary. The AI SDK handles chunked streaming back to the client.
// app/api/summarize/route.ts
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';
export async function POST(req: Request) {
const { videoUrl } = await req.json();
// 1. Extract audio via Ittybit
const audioUrl = await extractAudio(videoUrl);
// 2. Stream a summary using the AI SDK
const result = streamText({
model: openai('gpt-4o'),
messages: [
{
role: 'system',
content:
'You are a video summarizer. The user will provide a URL to an audio track ' +
'extracted from a video. Listen to the audio and return a structured summary ' +
'with a title, a one-sentence TL;DW, and timestamped chapters.',
},
{
role: 'user',
content: [
{
type: 'text',
text: 'Summarize the content from this audio track:',
},
{
type: 'file',
data: new URL(audioUrl),
mimeType: 'audio/mpeg',
},
],
},
],
});
return result.toDataStreamResponse();
}
The extractAudio function from the previous step can be imported directly, or inlined in this file — it runs server-side either way.
Client component with useChat
On the client, the useChat hook from the AI SDK manages the streaming connection and renders tokens as they arrive.
// app/summarize/page.tsx
'use client';
import { useChat } from '@ai-sdk/react';
import { useState } from 'react';
export default function SummarizePage() {
const [videoUrl, setVideoUrl] = useState('');
const { messages, append, isLoading } = useChat({
api: '/api/summarize',
});
function handleSubmit(e: React.FormEvent) {
e.preventDefault();
append({
role: 'user',
content: videoUrl,
});
}
const summary = messages.find((m) => m.role === 'assistant');
return (
<div>
<form onSubmit={handleSubmit}>
<input
type="url"
value={videoUrl}
onChange={(e) => setVideoUrl(e.target.value)}
placeholder="https://example.com/video.mp4"
required
/>
<button type="submit" disabled={isLoading}>
{isLoading ? 'Summarizing...' : 'Summarize'}
</button>
</form>
{summary && (
<div>
<h2>Summary</h2>
<pre style={{ whiteSpace: 'pre-wrap' }}>{summary.content}</pre>
</div>
)}
</div>
);
}
How it fits together
- The user pastes a video URL and submits the form.
useChatsends the URL to/api/summarize.- The route handler calls Ittybit to extract the audio track and polls until the file is ready.
- The audio URL is passed to
streamText, which sends it to the model as a file attachment. - The model processes the audio and streams back a summary token-by-token.
useChatupdates the UI in real time as chunks arrive.
The Ittybit extraction step runs once per video. If you want to avoid re-extracting audio for the same video, cache the audio URL keyed by the input video URL.
See also
- API
POST /jobswithkind: "audio"— extract audio from video - Extract audio from video — audio extraction options and formats
- Process uploads in Next.js — file uploads with Server Actions and webhooks
- Summarize video with GPT-4 Vision — frame-based summarization with OpenAI