# Podcast RAG chatbot with Langflow and Ittybit

Build a podcast Q&A chatbot using Ittybit audio processing, Langflow RAG pipeline, and vector search

Podcast back catalogs are full of insight that nobody can find. Listeners resort to scrubbing through hours of audio or hoping the host wrote decent show notes. A RAG pipeline fixes this -- ingest episodes through Ittybit to normalize the audio and pull a transcript, chunk and embed that text, store it in a vector DB, and let users ask questions in natural language. Langflow makes the whole thing visual, and the Ittybit custom component handles the media processing without leaving the canvas.

## The flow

The Langflow flow has six nodes wired together in sequence:

1. **Ittybit Audio Task** -- normalizes the raw podcast upload to a consistent format and extracts a transcript
2. **Text Splitter** -- chunks the transcript into overlapping segments for embedding
3. **Embedding Model** -- converts each chunk into a vector (OpenAI `text-embedding-3-small` or any model Langflow supports)
4. **Chroma Vector Store** -- stores the embeddings with episode metadata for retrieval
5. **Retrieval QA Chain** -- takes a user question, finds the most relevant chunks, and passes them to an LLM
6. **Chat Output** -- returns the answer with source references

The first node is the only custom piece. Everything else uses Langflow's built-in components.

## Install dependencies

You need `requests` and `chromadb` available in your Langflow environment:

```bash
pip install requests chromadb
```

## Create the Ittybit podcast component

This custom component POSTs an `audio` task to the Ittybit API, polls until the task completes, and outputs the transcript text. It normalizes audio to MP3 first so the transcript is as clean as possible.

```python

from typing import Optional

from langflow.custom import Component
from langflow.io import MessageTextInput, Output, SecretStrInput
from langflow.schema import Data

class IttybitPodcastIngest(Component):
    display_name = "Ittybit Podcast Ingest"
    description = "Normalize podcast audio and extract a transcript via Ittybit"
    icon = "mic"

    inputs = [
        SecretStrInput(
            name="api_key",
            display_name="API Key",
            info="Your Ittybit API key",
            required=True,
        ),
        MessageTextInput(
            name="input_url",
            display_name="Podcast URL",
            info="URL of the raw podcast audio file",
            required=True,
        ),
    ]

    outputs = [
        Output(display_name="Transcript", name="transcript", method="run"),
    ]

    def run(self) -> Data:
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json",
        }

        # Step 1: Normalize audio to MP3
        audio_res = requests.post(
            "https://api.ittybit.com/jobs",
            headers=headers,
            json={
                "input": self.input_url,
                "kind": "audio",
                "options": {
                    "format": "mp3",
                    "quality": "high",
                },
            },
        )
        audio_res.raise_for_status()
        audio_task = audio_res.json()
        audio_result = self._poll(audio_task["id"], headers)
        audio_url = audio_result["output"]["url"]

        # Step 2: Transcribe the normalized audio
        transcript_res = requests.post(
            "https://api.ittybit.com/jobs",
            headers=headers,
            json={
                "input": audio_url,
                "kind": "transcript",
            },
        )
        transcript_res.raise_for_status()
        transcript_task = transcript_res.json()
        transcript_result = self._poll(transcript_task["id"], headers)

        return Data(data={
            "text": transcript_result["output"]["text"],
            "audio_url": audio_url,
            "episode_url": self.input_url,
        })

    def _poll(
        self,
        task_id: str,
        headers: dict,
        timeout: int = 600,
        interval: int = 3,
    ) -> dict:
        deadline = time.time() + timeout
        while time.time() < deadline:
            res = requests.get(
                f"https://api.ittybit.com/jobs/{task_id}",
                headers=headers,
            )
            res.raise_for_status()
            data = res.json()

            if data["status"] == "completed":
                return data
            if data["status"] == "error":
                raise RuntimeError(
                    f"Task {task_id} failed: {data.get('error', 'unknown error')}"
                )

            time.sleep(interval)

        raise TimeoutError(f"Task {task_id} did not complete within {timeout}s")
```

Paste this into Langflow's custom component editor. When the flow runs, the node takes a raw podcast URL (WAV, M4A, whatever the host uploaded) and outputs a `Data` object containing the transcript text, the normalized audio URL, and the original episode URL.

## Wire the Langflow flow

With the Ittybit component saved, build the rest of the flow using built-in Langflow nodes.

### Text Splitter

Connect the Ittybit component's `Transcript` output to a **Recursive Character Text Splitter** node. Configure it with:

- **Chunk Size:** 1000
- **Chunk Overlap:** 200
- **Separator:** `\n\n`

The overlap ensures that questions about topics that span chunk boundaries still match.

### Embedding Model

Wire the splitter output into an **OpenAI Embeddings** node (or whichever embedding model you prefer). Set the model to `text-embedding-3-small` and provide your OpenAI API key.

### Chroma Vector Store

Connect the embeddings to a **Chroma** node. Set the collection name to `podcasts`. Chroma runs in-process by default -- no server needed for development. For production, point the node at a persistent Chroma instance.

### Retrieval QA Chain

Add a **Retrieval QA** node. Wire the Chroma node as the retriever and connect an LLM node (GPT-4o, Claude, or any chat model) as the language model. Set the chain type to `stuff` -- for podcast chunks this is usually sufficient.

### Chat Input and Output

Add a **Chat Input** node connected to the Retrieval QA chain's question input, and a **Chat Output** node connected to the chain's answer output. This gives you a conversational interface.

The complete flow looks like:

```
Chat Input ──┐
             ▼
Ittybit Podcast Ingest → Text Splitter → Embeddings → Chroma
                                                        │
Chat Input (query) → Retrieval QA Chain ◄───────────────┘
                         │
                         ▼
                    Chat Output
```

The top path runs once per episode to build the index. The bottom path runs on every user question.

## Ingest multiple episodes

To load a full back catalog, use a simple script that feeds episode URLs into the Ittybit component via the Langflow API:

```python

LANGFLOW_URL = "http://localhost:7860/api/v1/run"
FLOW_ID = "your-flow-id"

episodes = [
    "https://example.com/podcasts/ep-01.wav",
    "https://example.com/podcasts/ep-02.m4a",
    "https://example.com/podcasts/ep-03.mp3",
]

for url in episodes:
    res = requests.post(
        f"{LANGFLOW_URL}/{FLOW_ID}",
        json={
            "input_value": url,
            "output_type": "chat",
            "input_type": "chat",
            "tweaks": {
                "IttybitPodcastIngest": {
                    "input_url": url,
                },
            },
        },
    )
    print(f"{url}: {res.status_code}")
```

Each episode gets normalized, transcribed, chunked, embedded, and stored. Once the index is built, the chat path is ready for questions.

## Query the chatbot

With episodes indexed, ask questions through the Chat Input node or the Langflow API:

```python
res = requests.post(
    f"{LANGFLOW_URL}/{FLOW_ID}",
    json={
        "input_value": "What did the guests say about fine-tuning vs RAG?",
        "output_type": "chat",
        "input_type": "chat",
    },
)
print(res.json()["outputs"][0]["outputs"][0]["results"]["message"]["text"])
```

The retrieval chain pulls the most relevant transcript chunks from Chroma, passes them as context to the LLM, and returns a grounded answer with references to specific episodes.

## See also

- [Custom Ittybit component for Langflow](/guides/custom-component-for-langflow) -- the general-purpose media processing component
- [Podcast to blog post with OpenAI](/guides/podcast-to-blog-post-with-openai) -- transcript to written content
- [Podcast search with Supabase](/guides/podcast-search-with-supabase) -- full-text search alternative to vector search
- [Ittybit Task API reference](/reference/tasks)