Video summary pipeline in Langflow with Ittybit
Drop a video URL into Langflow and get a written summary back — no code, no deployments. The flow extracts audio with Ittybit, transcribes it, feeds the transcript through a prompt template, and runs it through an LLM to produce a summary. Five nodes, all wired together on the canvas.
Prerequisites
- A Langflow instance (cloud or local)
- An Ittybit API key
- An OpenAI API key (or any LLM provider Langflow supports)
The flow
The pipeline is five nodes connected left to right:
- Webhook Trigger — receives a video URL to kick off the pipeline
- Ittybit Audio Extraction — strips the audio track from the video
- OpenAI Whisper Transcription — transcribes the audio to text
- Prompt Template — wraps the transcript in summarization instructions
- LLM Summarizer — generates the final summary
Each node’s output feeds directly into the next node’s input. No branching, no conditionals.
Node 1: Webhook Trigger
Add a Webhook component as the entry point. This gives you a URL you can POST to from any external system — a CMS, a Slack bot, a cron job.
The incoming payload should include the video URL:
{
"video_url": "https://example.com/uploads/meeting-recording.mp4"
}
Connect the Webhook’s output to the Ittybit component’s input_url field.
Node 2: Ittybit Audio Extraction
Langflow doesn’t have a built-in Ittybit node, so you create a custom component. Open the code editor for a new custom component and paste this:
import time
import requests
from langflow.custom import Component
from langflow.io import MessageTextInput, Output, SecretStrInput
from langflow.schema import Data
class IttybitAudioExtractor(Component):
display_name = "Ittybit Audio Extraction"
description = "Extract audio from a video file using the Ittybit Tasks API"
icon = "audio-lines"
inputs = [
SecretStrInput(
name="api_key",
display_name="API Key",
info="Your Ittybit API key",
required=True,
),
MessageTextInput(
name="input_url",
display_name="Video URL",
info="URL of the source video file",
required=True,
),
]
outputs = [
Output(display_name="Audio URL", name="audio_url", method="run"),
]
def run(self) -> Data:
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
}
# Create the audio extraction task
res = requests.post(
"https://api.ittybit.com/jobs",
headers=headers,
json={
"input": self.input_url,
"kind": "audio",
},
)
res.raise_for_status()
task = res.json()
task_id = task["id"]
# Poll until the task completes
deadline = time.time() + 300
while time.time() < deadline:
res = requests.get(
f"https://api.ittybit.com/jobs/{task_id}",
headers=headers,
)
res.raise_for_status()
data = res.json()
if data["status"] == "completed":
return Data(data={"audio_url": data["output"]["url"]})
if data["status"] == "error":
raise RuntimeError(
f"Task {task_id} failed: {data.get('error', 'unknown error')}"
)
time.sleep(2)
raise TimeoutError(f"Task {task_id} did not complete within 300s")
The component POSTs to POST /jobs with kind: "audio", then polls GET /jobs/:id every 2 seconds until the audio file is ready. The output is the CDN URL for the extracted audio.
Wire the audio_url output to the transcription node’s input.
Node 3: OpenAI Whisper Transcription
Add Langflow’s built-in OpenAI Whisper component (under the Speech-to-Text category). Configure it:
- OpenAI API Key — your key
- Audio URL — connect this to the Ittybit component’s
audio_urloutput
This node sends the audio to Whisper and returns the full transcript as text.
If your Langflow version doesn’t include a Whisper node, you can use a second custom component that calls the OpenAI audio transcription endpoint directly:
import requests
from langflow.custom import Component
from langflow.io import MessageTextInput, Output, SecretStrInput
from langflow.schema import Data
class WhisperTranscriber(Component):
display_name = "Whisper Transcriber"
description = "Transcribe audio using OpenAI Whisper API"
icon = "message-square"
inputs = [
SecretStrInput(
name="openai_api_key",
display_name="OpenAI API Key",
required=True,
),
MessageTextInput(
name="audio_url",
display_name="Audio URL",
info="URL of the audio file to transcribe",
required=True,
),
]
outputs = [
Output(display_name="Transcript", name="transcript", method="run"),
]
def run(self) -> Data:
# Download the audio file
audio_res = requests.get(self.audio_url)
audio_res.raise_for_status()
# Send to Whisper
res = requests.post(
"https://api.openai.com/v1/audio/transcriptions",
headers={"Authorization": f"Bearer {self.openai_api_key}"},
files={"file": ("audio.mp3", audio_res.content, "audio/mpeg")},
data={"model": "whisper-1"},
)
res.raise_for_status()
return Data(data={"transcript": res.json()["text"]})
Connect the transcript output to the prompt template.
Node 4: Prompt Template
Add a Prompt component. This wraps the raw transcript in instructions for the LLM. Set the template to:
Summarize the following transcript from a video recording.
Return a structured summary with:
- A one-line TL;DW
- 3-5 key points as bullet points
- Any action items mentioned
Transcript:
{transcript}
Map the transcript variable to the incoming transcript text from the previous node.
Connect the prompt output to the LLM node.
Node 5: LLM Summarizer
Add an OpenAI model component (or whichever LLM provider you prefer). Configure it:
- Model —
gpt-4oorgpt-4o-minifor lower cost - Temperature —
0.3for consistent summaries
Connect the prompt template output to the model’s input. The LLM output is your finished summary.
Add a Chat Output or Text Output node at the end to display or return the result.
Connections summary
Webhook Trigger
└─ video_url ──▶ Ittybit Audio Extraction (input_url)
└─ audio_url ──▶ Whisper Transcription (audio_url)
└─ transcript ──▶ Prompt Template (transcript)
└─ formatted prompt ──▶ LLM Summarizer
└─ summary ──▶ Output
Triggering the flow
Once the flow is deployed, POST to the webhook URL:
curl -X POST https://your-langflow-instance/api/v1/webhook/YOUR_FLOW_ID \
-H "Content-Type: application/json" \
-d '{"video_url": "https://example.com/uploads/meeting-recording.mp4"}'const res = await fetch('https://your-langflow-instance/api/v1/webhook/YOUR_FLOW_ID', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
video_url: 'https://example.com/uploads/meeting-recording.mp4',
}),
});
const summary = await res.json();
console.log(summary);import requests
res = requests.post(
"https://your-langflow-instance/api/v1/webhook/YOUR_FLOW_ID",
json={"video_url": "https://example.com/uploads/meeting-recording.mp4"},
)
print(res.json()) Variations
Swap the LLM. Replace the OpenAI node with Anthropic, Ollama, or any provider Langflow supports. The rest of the flow stays the same.
Add language detection. Insert a second prompt template between transcription and summarization that detects the language and translates to English before summarizing.
Batch processing. Use Langflow’s loop component to iterate over a list of video URLs, running each through the same pipeline.
See also
- Custom Ittybit component for Langflow — the general-purpose Ittybit node
- Extract audio from video — audio extraction options and formats
- Summarize video with GPT-4 Vision — frame-based summarization approach