# Video summarizer with Vercel AI SDK and Ittybit

Extract audio with Ittybit and stream AI-generated summaries using the Vercel AI SDK

Use Ittybit to extract the audio track from a video, then pipe it through the Vercel AI SDK to transcribe and summarize the content in a streaming response. The user sees the summary appear token-by-token while the model is still generating.

## Install dependencies

```bash
npm install ai @ai-sdk/openai
```

## Extract audio with Ittybit

Create a Server Action that POSTs an `audio` task to Ittybit, then polls until the extracted audio file is ready.

```typescript
// app/actions.ts
'use server';

async function extractAudio(videoUrl: string): Promise<string> {
  const res = await fetch('https://api.ittybit.com/jobs', {
    method: 'POST',
    headers: {
      Authorization: `Bearer ${process.env.ITTYBIT_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      input: videoUrl,
      kind: 'audio',
      options: { format: 'mp3' },
    }),
  });

  const task = await res.json();
  return pollTask(task.id);
}

async function pollTask(taskId: string): Promise<string> {
  while (true) {
    const res = await fetch(`https://api.ittybit.com/jobs/${taskId}`, {
      headers: {
        Authorization: `Bearer ${process.env.ITTYBIT_API_KEY}`,
      },
    });
    const task = await res.json();

    if (task.status === 'failed') {
      throw new Error(`Task ${taskId} failed`);
    }
    if (task.status === 'succeeded') {
      return task.output.url;
    }

    await new Promise((r) => setTimeout(r, 2000));
  }
}
```

## Stream the summary with the AI SDK

Create an API route that takes a video URL, extracts the audio, and uses `streamText` to generate a summary. The AI SDK handles chunked streaming back to the client.

```typescript
// app/api/summarize/route.ts

export async function POST(req: Request) {
  const { videoUrl } = await req.json();

  // 1. Extract audio via Ittybit
  const audioUrl = await extractAudio(videoUrl);

  // 2. Stream a summary using the AI SDK
  const result = streamText({
    model: openai('gpt-4o'),
    messages: [
      {
        role: 'system',
        content:
          'You are a video summarizer. The user will provide a URL to an audio track ' +
          'extracted from a video. Listen to the audio and return a structured summary ' +
          'with a title, a one-sentence TL;DW, and timestamped chapters.',
      },
      {
        role: 'user',
        content: [
          {
            type: 'text',
            text: 'Summarize the content from this audio track:',
          },
          {
            type: 'file',
            data: new URL(audioUrl),
            mimeType: 'audio/mpeg',
          },
        ],
      },
    ],
  });

  return result.toDataStreamResponse();
}
```

The `extractAudio` function from the previous step can be imported directly, or inlined in this file -- it runs server-side either way.

## Client component with useChat

On the client, the `useChat` hook from the AI SDK manages the streaming connection and renders tokens as they arrive.

```tsx
// app/summarize/page.tsx
'use client';

export default function SummarizePage() {
  const [videoUrl, setVideoUrl] = useState('');

  const { messages, append, isLoading } = useChat({
    api: '/api/summarize',
  });

  function handleSubmit(e: React.FormEvent) {
    e.preventDefault();
    append({
      role: 'user',
      content: videoUrl,
    });
  }

  const summary = messages.find((m) => m.role === 'assistant');

  return (
    <div>
      <form onSubmit={handleSubmit}>
        <input
          type="url"
          value={videoUrl}
          onChange={(e) => setVideoUrl(e.target.value)}
          placeholder="https://example.com/video.mp4"
          required
        />
        <button type="submit" disabled={isLoading}>
          {isLoading ? 'Summarizing...' : 'Summarize'}
        </button>
      </form>

      {summary && (
        <div>
          <h2>Summary</h2>
          <pre style={{ whiteSpace: 'pre-wrap' }}>{summary.content}</pre>
        </div>
      )}
    </div>
  );
}
```

## How it fits together

1. The user pastes a video URL and submits the form.
2. `useChat` sends the URL to `/api/summarize`.
3. The route handler calls Ittybit to extract the audio track and polls until the file is ready.
4. The audio URL is passed to `streamText`, which sends it to the model as a file attachment.
5. The model processes the audio and streams back a summary token-by-token.
6. `useChat` updates the UI in real time as chunks arrive.

The Ittybit extraction step runs once per video. If you want to avoid re-extracting audio for the same video, cache the audio URL keyed by the input video URL.

## See also

- [API `POST /jobs`](/api/create-job) with `kind: "audio"` -- extract audio from video
- [Extract audio from video](/guides/extract-audio-from-video) -- audio extraction options and formats
- [Process uploads in Next.js](/guides/process-uploads-in-nextjs) -- file uploads with Server Actions and webhooks
- [Summarize video with GPT-4 Vision](/guides/summarize-video-with-gpt4-vision) -- frame-based summarization with OpenAI