# Video summary pipeline in Langflow with Ittybit

Build a no-code video-to-summary pipeline using Langflow and Ittybit audio extraction

Drop a video URL into Langflow and get a written summary back -- no code, no deployments. The flow extracts audio with Ittybit, transcribes it, feeds the transcript through a prompt template, and runs it through an LLM to produce a summary. Five nodes, all wired together on the canvas.

## Prerequisites

- A [Langflow](https://www.langflow.org/) instance (cloud or local)
- An Ittybit API key
- An OpenAI API key (or any LLM provider Langflow supports)

## The flow

The pipeline is five nodes connected left to right:

1. **Webhook Trigger** -- receives a video URL to kick off the pipeline
2. **Ittybit Audio Extraction** -- strips the audio track from the video
3. **OpenAI Whisper Transcription** -- transcribes the audio to text
4. **Prompt Template** -- wraps the transcript in summarization instructions
5. **LLM Summarizer** -- generates the final summary

Each node's output feeds directly into the next node's input. No branching, no conditionals.

## Node 1: Webhook Trigger

Add a **Webhook** component as the entry point. This gives you a URL you can POST to from any external system -- a CMS, a Slack bot, a cron job.

The incoming payload should include the video URL:

```json
{
  "video_url": "https://example.com/uploads/meeting-recording.mp4"
}
```

Connect the Webhook's output to the Ittybit component's `input_url` field.

## Node 2: Ittybit Audio Extraction

Langflow doesn't have a built-in Ittybit node, so you create a custom component. Open the code editor for a new custom component and paste this:

```python

from langflow.custom import Component
from langflow.io import MessageTextInput, Output, SecretStrInput
from langflow.schema import Data

class IttybitAudioExtractor(Component):
    display_name = "Ittybit Audio Extraction"
    description = "Extract audio from a video file using the Ittybit Tasks API"
    icon = "audio-lines"

    inputs = [
        SecretStrInput(
            name="api_key",
            display_name="API Key",
            info="Your Ittybit API key",
            required=True,
        ),
        MessageTextInput(
            name="input_url",
            display_name="Video URL",
            info="URL of the source video file",
            required=True,
        ),
    ]

    outputs = [
        Output(display_name="Audio URL", name="audio_url", method="run"),
    ]

    def run(self) -> Data:
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json",
        }

        # Create the audio extraction task
        res = requests.post(
            "https://api.ittybit.com/jobs",
            headers=headers,
            json={
                "input": self.input_url,
                "kind": "audio",
            },
        )
        res.raise_for_status()
        task = res.json()
        task_id = task["id"]

        # Poll until the task completes
        deadline = time.time() + 300
        while time.time() < deadline:
            res = requests.get(
                f"https://api.ittybit.com/jobs/{task_id}",
                headers=headers,
            )
            res.raise_for_status()
            data = res.json()

            if data["status"] == "completed":
                return Data(data={"audio_url": data["output"]["url"]})
            if data["status"] == "error":
                raise RuntimeError(
                    f"Task {task_id} failed: {data.get('error', 'unknown error')}"
                )

            time.sleep(2)

        raise TimeoutError(f"Task {task_id} did not complete within 300s")
```

The component POSTs to `POST /jobs` with `kind: "audio"`, then polls `GET /jobs/:id` every 2 seconds until the audio file is ready. The output is the CDN URL for the extracted audio.

Wire the `audio_url` output to the transcription node's input.

## Node 3: OpenAI Whisper Transcription

Add Langflow's built-in **OpenAI Whisper** component (under the Speech-to-Text category). Configure it:

- **OpenAI API Key** -- your key
- **Audio URL** -- connect this to the Ittybit component's `audio_url` output

This node sends the audio to Whisper and returns the full transcript as text.

If your Langflow version doesn't include a Whisper node, you can use a second custom component that calls the OpenAI audio transcription endpoint directly:

```python

from langflow.custom import Component
from langflow.io import MessageTextInput, Output, SecretStrInput
from langflow.schema import Data

class WhisperTranscriber(Component):
    display_name = "Whisper Transcriber"
    description = "Transcribe audio using OpenAI Whisper API"
    icon = "message-square"

    inputs = [
        SecretStrInput(
            name="openai_api_key",
            display_name="OpenAI API Key",
            required=True,
        ),
        MessageTextInput(
            name="audio_url",
            display_name="Audio URL",
            info="URL of the audio file to transcribe",
            required=True,
        ),
    ]

    outputs = [
        Output(display_name="Transcript", name="transcript", method="run"),
    ]

    def run(self) -> Data:
        # Download the audio file
        audio_res = requests.get(self.audio_url)
        audio_res.raise_for_status()

        # Send to Whisper
        res = requests.post(
            "https://api.openai.com/v1/audio/transcriptions",
            headers={"Authorization": f"Bearer {self.openai_api_key}"},
            files={"file": ("audio.mp3", audio_res.content, "audio/mpeg")},
            data={"model": "whisper-1"},
        )
        res.raise_for_status()
        return Data(data={"transcript": res.json()["text"]})
```

Connect the `transcript` output to the prompt template.

## Node 4: Prompt Template

Add a **Prompt** component. This wraps the raw transcript in instructions for the LLM. Set the template to:

```text
Summarize the following transcript from a video recording.
Return a structured summary with:
- A one-line TL;DW
- 3-5 key points as bullet points
- Any action items mentioned

Transcript:
{transcript}
```

Map the `transcript` variable to the incoming transcript text from the previous node.

Connect the prompt output to the LLM node.

## Node 5: LLM Summarizer

Add an **OpenAI** model component (or whichever LLM provider you prefer). Configure it:

- **Model** -- `gpt-4o` or `gpt-4o-mini` for lower cost
- **Temperature** -- `0.3` for consistent summaries

Connect the prompt template output to the model's input. The LLM output is your finished summary.

Add a **Chat Output** or **Text Output** node at the end to display or return the result.

## Connections summary

```text
Webhook Trigger
  └─ video_url ──▶ Ittybit Audio Extraction (input_url)
                      └─ audio_url ──▶ Whisper Transcription (audio_url)
                                          └─ transcript ──▶ Prompt Template (transcript)
                                                              └─ formatted prompt ──▶ LLM Summarizer
                                                                                        └─ summary ──▶ Output
```

## Triggering the flow

Once the flow is deployed, POST to the webhook URL:

<CodeGroup labels={["curl", "TypeScript", "Python"]}>
```bash
curl -X POST https://your-langflow-instance/api/v1/webhook/YOUR_FLOW_ID \
  -H "Content-Type: application/json" \
  -d '{"video_url": "https://example.com/uploads/meeting-recording.mp4"}'
```

```typescript
const res = await fetch('https://your-langflow-instance/api/v1/webhook/YOUR_FLOW_ID', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    video_url: 'https://example.com/uploads/meeting-recording.mp4',
  }),
});
const summary = await res.json();
console.log(summary);
```

```python

res = requests.post(
    "https://your-langflow-instance/api/v1/webhook/YOUR_FLOW_ID",
    json={"video_url": "https://example.com/uploads/meeting-recording.mp4"},
)
print(res.json())
```

</CodeGroup>

## Variations

**Swap the LLM.** Replace the OpenAI node with Anthropic, Ollama, or any provider Langflow supports. The rest of the flow stays the same.

**Add language detection.** Insert a second prompt template between transcription and summarization that detects the language and translates to English before summarizing.

**Batch processing.** Use Langflow's loop component to iterate over a list of video URLs, running each through the same pipeline.

## See also

- [Custom Ittybit component for Langflow](/guides/custom-component-for-langflow) -- the general-purpose Ittybit node
- [Extract audio from video](/guides/extract-audio-from-video) -- audio extraction options and formats
- [Summarize video with GPT-4 Vision](/guides/summarize-video-with-gpt4-vision) -- frame-based summarization approach