Video summary pipeline in Langflow with Ittybit

View Markdown

Drop a video URL into Langflow and get a written summary back — no code, no deployments. The flow extracts audio with Ittybit, transcribes it, feeds the transcript through a prompt template, and runs it through an LLM to produce a summary. Five nodes, all wired together on the canvas.

Prerequisites

A Langflow instance (cloud or local)
An Ittybit API key
An OpenAI API key (or any LLM provider Langflow supports)

The flow

The pipeline is five nodes connected left to right:

Webhook Trigger — receives a video URL to kick off the pipeline
Ittybit Audio Extraction — strips the audio track from the video
OpenAI Whisper Transcription — transcribes the audio to text
Prompt Template — wraps the transcript in summarization instructions
LLM Summarizer — generates the final summary

Each node’s output feeds directly into the next node’s input. No branching, no conditionals.

Node 1: Webhook Trigger

Add a Webhook component as the entry point. This gives you a URL you can POST to from any external system — a CMS, a Slack bot, a cron job.

The incoming payload should include the video URL:

{
  "video_url": "https://example.com/uploads/meeting-recording.mp4"
}

Connect the Webhook’s output to the Ittybit component’s input_url field.

Node 2: Ittybit Audio Extraction

Langflow doesn’t have a built-in Ittybit node, so you create a custom component. Open the code editor for a new custom component and paste this:

import time

import requests
from langflow.custom import Component
from langflow.io import MessageTextInput, Output, SecretStrInput
from langflow.schema import Data


class IttybitAudioExtractor(Component):
    display_name = "Ittybit Audio Extraction"
    description = "Extract audio from a video file using the Ittybit Tasks API"
    icon = "audio-lines"

    inputs = [
        SecretStrInput(
            name="api_key",
            display_name="API Key",
            info="Your Ittybit API key",
            required=True,
        ),
        MessageTextInput(
            name="input_url",
            display_name="Video URL",
            info="URL of the source video file",
            required=True,
        ),
    ]

    outputs = [
        Output(display_name="Audio URL", name="audio_url", method="run"),
    ]

    def run(self) -> Data:
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json",
        }

        # Create the audio extraction task
        res = requests.post(
            "https://api.ittybit.com/jobs",
            headers=headers,
            json={
                "input": self.input_url,
                "kind": "audio",
            },
        )
        res.raise_for_status()
        task = res.json()
        task_id = task["id"]

        # Poll until the task completes
        deadline = time.time() + 300
        while time.time() < deadline:
            res = requests.get(
                f"https://api.ittybit.com/jobs/{task_id}",
                headers=headers,
            )
            res.raise_for_status()
            data = res.json()

            if data["status"] == "completed":
                return Data(data={"audio_url": data["output"]["url"]})
            if data["status"] == "error":
                raise RuntimeError(
                    f"Task {task_id} failed: {data.get('error', 'unknown error')}"
                )

            time.sleep(2)

        raise TimeoutError(f"Task {task_id} did not complete within 300s")

The component POSTs to POST /jobs with kind: "audio", then polls GET /jobs/:id every 2 seconds until the audio file is ready. The output is the CDN URL for the extracted audio.

Wire the audio_url output to the transcription node’s input.

Node 3: OpenAI Whisper Transcription

Add Langflow’s built-in OpenAI Whisper component (under the Speech-to-Text category). Configure it:

OpenAI API Key — your key
Audio URL — connect this to the Ittybit component’s audio_url output

This node sends the audio to Whisper and returns the full transcript as text.

If your Langflow version doesn’t include a Whisper node, you can use a second custom component that calls the OpenAI audio transcription endpoint directly:

import requests
from langflow.custom import Component
from langflow.io import MessageTextInput, Output, SecretStrInput
from langflow.schema import Data


class WhisperTranscriber(Component):
    display_name = "Whisper Transcriber"
    description = "Transcribe audio using OpenAI Whisper API"
    icon = "message-square"

    inputs = [
        SecretStrInput(
            name="openai_api_key",
            display_name="OpenAI API Key",
            required=True,
        ),
        MessageTextInput(
            name="audio_url",
            display_name="Audio URL",
            info="URL of the audio file to transcribe",
            required=True,
        ),
    ]

    outputs = [
        Output(display_name="Transcript", name="transcript", method="run"),
    ]

    def run(self) -> Data:
        # Download the audio file
        audio_res = requests.get(self.audio_url)
        audio_res.raise_for_status()

        # Send to Whisper
        res = requests.post(
            "https://api.openai.com/v1/audio/transcriptions",
            headers={"Authorization": f"Bearer {self.openai_api_key}"},
            files={"file": ("audio.mp3", audio_res.content, "audio/mpeg")},
            data={"model": "whisper-1"},
        )
        res.raise_for_status()
        return Data(data={"transcript": res.json()["text"]})

Connect the transcript output to the prompt template.

Node 4: Prompt Template

Add a Prompt component. This wraps the raw transcript in instructions for the LLM. Set the template to:

Summarize the following transcript from a video recording.
Return a structured summary with:
- A one-line TL;DW
- 3-5 key points as bullet points
- Any action items mentioned

Transcript:
{transcript}

Map the transcript variable to the incoming transcript text from the previous node.

Connect the prompt output to the LLM node.

Node 5: LLM Summarizer

Add an OpenAI model component (or whichever LLM provider you prefer). Configure it:

Model — gpt-4o or gpt-4o-mini for lower cost
Temperature — 0.3 for consistent summaries

Connect the prompt template output to the model’s input. The LLM output is your finished summary.

Add a Chat Output or Text Output node at the end to display or return the result.

Connections summary

Webhook Trigger
  └─ video_url ──▶ Ittybit Audio Extraction (input_url)
                      └─ audio_url ──▶ Whisper Transcription (audio_url)
                                          └─ transcript ──▶ Prompt Template (transcript)
                                                              └─ formatted prompt ──▶ LLM Summarizer
                                                                                        └─ summary ──▶ Output

Triggering the flow

Once the flow is deployed, POST to the webhook URL:

curl -X POST https://your-langflow-instance/api/v1/webhook/YOUR_FLOW_ID \
  -H "Content-Type: application/json" \
  -d '{"video_url": "https://example.com/uploads/meeting-recording.mp4"}'

const res = await fetch('https://your-langflow-instance/api/v1/webhook/YOUR_FLOW_ID', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    video_url: 'https://example.com/uploads/meeting-recording.mp4',
  }),
});
const summary = await res.json();
console.log(summary);

import requests

res = requests.post(
    "https://your-langflow-instance/api/v1/webhook/YOUR_FLOW_ID",
    json={"video_url": "https://example.com/uploads/meeting-recording.mp4"},
)
print(res.json())

Variations

Swap the LLM. Replace the OpenAI node with Anthropic, Ollama, or any provider Langflow supports. The rest of the flow stays the same.

Add language detection. Insert a second prompt template between transcription and summarization that detects the language and translates to English before summarizing.

Batch processing. Use Langflow’s loop component to iterate over a list of video URLs, running each through the same pipeline.