Podcast RAG chatbot with Langflow and Ittybit

View Markdown

Podcast back catalogs are full of insight that nobody can find. Listeners resort to scrubbing through hours of audio or hoping the host wrote decent show notes. A RAG pipeline fixes this — ingest episodes through Ittybit to normalize the audio and pull a transcript, chunk and embed that text, store it in a vector DB, and let users ask questions in natural language. Langflow makes the whole thing visual, and the Ittybit custom component handles the media processing without leaving the canvas.

The flow

The Langflow flow has six nodes wired together in sequence:

  1. Ittybit Audio Task — normalizes the raw podcast upload to a consistent format and extracts a transcript
  2. Text Splitter — chunks the transcript into overlapping segments for embedding
  3. Embedding Model — converts each chunk into a vector (OpenAI text-embedding-3-small or any model Langflow supports)
  4. Chroma Vector Store — stores the embeddings with episode metadata for retrieval
  5. Retrieval QA Chain — takes a user question, finds the most relevant chunks, and passes them to an LLM
  6. Chat Output — returns the answer with source references

The first node is the only custom piece. Everything else uses Langflow’s built-in components.

Install dependencies

You need requests and chromadb available in your Langflow environment:

pip install requests chromadb

Create the Ittybit podcast component

This custom component POSTs an audio task to the Ittybit API, polls until the task completes, and outputs the transcript text. It normalizes audio to MP3 first so the transcript is as clean as possible.

import time
from typing import Optional

import requests
from langflow.custom import Component
from langflow.io import MessageTextInput, Output, SecretStrInput
from langflow.schema import Data


class IttybitPodcastIngest(Component):
    display_name = "Ittybit Podcast Ingest"
    description = "Normalize podcast audio and extract a transcript via Ittybit"
    icon = "mic"

    inputs = [
        SecretStrInput(
            name="api_key",
            display_name="API Key",
            info="Your Ittybit API key",
            required=True,
        ),
        MessageTextInput(
            name="input_url",
            display_name="Podcast URL",
            info="URL of the raw podcast audio file",
            required=True,
        ),
    ]

    outputs = [
        Output(display_name="Transcript", name="transcript", method="run"),
    ]

    def run(self) -> Data:
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json",
        }

        # Step 1: Normalize audio to MP3
        audio_res = requests.post(
            "https://api.ittybit.com/jobs",
            headers=headers,
            json={
                "input": self.input_url,
                "kind": "audio",
                "options": {
                    "format": "mp3",
                    "quality": "high",
                },
            },
        )
        audio_res.raise_for_status()
        audio_task = audio_res.json()
        audio_result = self._poll(audio_task["id"], headers)
        audio_url = audio_result["output"]["url"]

        # Step 2: Transcribe the normalized audio
        transcript_res = requests.post(
            "https://api.ittybit.com/jobs",
            headers=headers,
            json={
                "input": audio_url,
                "kind": "transcript",
            },
        )
        transcript_res.raise_for_status()
        transcript_task = transcript_res.json()
        transcript_result = self._poll(transcript_task["id"], headers)

        return Data(data={
            "text": transcript_result["output"]["text"],
            "audio_url": audio_url,
            "episode_url": self.input_url,
        })

    def _poll(
        self,
        task_id: str,
        headers: dict,
        timeout: int = 600,
        interval: int = 3,
    ) -> dict:
        deadline = time.time() + timeout
        while time.time() < deadline:
            res = requests.get(
                f"https://api.ittybit.com/jobs/{task_id}",
                headers=headers,
            )
            res.raise_for_status()
            data = res.json()

            if data["status"] == "completed":
                return data
            if data["status"] == "error":
                raise RuntimeError(
                    f"Task {task_id} failed: {data.get('error', 'unknown error')}"
                )

            time.sleep(interval)

        raise TimeoutError(f"Task {task_id} did not complete within {timeout}s")

Paste this into Langflow’s custom component editor. When the flow runs, the node takes a raw podcast URL (WAV, M4A, whatever the host uploaded) and outputs a Data object containing the transcript text, the normalized audio URL, and the original episode URL.

Wire the Langflow flow

With the Ittybit component saved, build the rest of the flow using built-in Langflow nodes.

Text Splitter

Connect the Ittybit component’s Transcript output to a Recursive Character Text Splitter node. Configure it with:

  • Chunk Size: 1000
  • Chunk Overlap: 200
  • Separator: \n\n

The overlap ensures that questions about topics that span chunk boundaries still match.

Embedding Model

Wire the splitter output into an OpenAI Embeddings node (or whichever embedding model you prefer). Set the model to text-embedding-3-small and provide your OpenAI API key.

Chroma Vector Store

Connect the embeddings to a Chroma node. Set the collection name to podcasts. Chroma runs in-process by default — no server needed for development. For production, point the node at a persistent Chroma instance.

Retrieval QA Chain

Add a Retrieval QA node. Wire the Chroma node as the retriever and connect an LLM node (GPT-4o, Claude, or any chat model) as the language model. Set the chain type to stuff — for podcast chunks this is usually sufficient.

Chat Input and Output

Add a Chat Input node connected to the Retrieval QA chain’s question input, and a Chat Output node connected to the chain’s answer output. This gives you a conversational interface.

The complete flow looks like:

Chat Input ──┐

Ittybit Podcast Ingest → Text Splitter → Embeddings → Chroma

Chat Input (query) → Retrieval QA Chain ◄───────────────┘


                    Chat Output

The top path runs once per episode to build the index. The bottom path runs on every user question.

Ingest multiple episodes

To load a full back catalog, use a simple script that feeds episode URLs into the Ittybit component via the Langflow API:

import requests

LANGFLOW_URL = "http://localhost:7860/api/v1/run"
FLOW_ID = "your-flow-id"

episodes = [
    "https://example.com/podcasts/ep-01.wav",
    "https://example.com/podcasts/ep-02.m4a",
    "https://example.com/podcasts/ep-03.mp3",
]

for url in episodes:
    res = requests.post(
        f"{LANGFLOW_URL}/{FLOW_ID}",
        json={
            "input_value": url,
            "output_type": "chat",
            "input_type": "chat",
            "tweaks": {
                "IttybitPodcastIngest": {
                    "input_url": url,
                },
            },
        },
    )
    print(f"{url}: {res.status_code}")

Each episode gets normalized, transcribed, chunked, embedded, and stored. Once the index is built, the chat path is ready for questions.

Query the chatbot

With episodes indexed, ask questions through the Chat Input node or the Langflow API:

res = requests.post(
    f"{LANGFLOW_URL}/{FLOW_ID}",
    json={
        "input_value": "What did the guests say about fine-tuning vs RAG?",
        "output_type": "chat",
        "input_type": "chat",
    },
)
print(res.json()["outputs"][0]["outputs"][0]["results"]["message"]["text"])

The retrieval chain pulls the most relevant transcript chunks from Chroma, passes them as context to the LLM, and returns a grounded answer with references to specific episodes.

See also