Podcast search with Supabase and Ittybit

View Markdown

Podcast episodes are long and hard to browse. If your users can’t search for what was said, they won’t find it. This guide wires up Supabase Storage, Ittybit audio processing, and Postgres full-text search so every episode is transcribed and instantly searchable the moment it’s uploaded.

Architecture

Creator uploads a raw audio file to Supabase Storage
A database webhook fires an Edge Function on insert
The Edge Function creates an Ittybit audio task to normalize the file to MP3
A second task transcribes the audio
Ittybit sends a webhook on completion
A receiving Edge Function stores the processed URL and transcript in Postgres
A tsvector column enables instant full-text search across all episodes

Create the episodes table

The search_vector column is a generated tsvector that automatically updates whenever the transcript changes. The GIN index makes queries fast even across thousands of episodes.

create table public.episodes (
  id uuid primary key default gen_random_uuid(),
  title text not null,
  storage_path text not null,
  source_url text not null,
  audio_task_id text,
  transcript_task_id text,
  status text default 'pending',
  audio_url text,
  transcript text,
  search_vector tsvector generated always as (
    to_tsvector('english', coalesce(title, '') || ' ' || coalesce(transcript, ''))
  ) stored,
  duration_seconds numeric,
  created_at timestamptz default now(),
  updated_at timestamptz default now()
);

create index idx_episodes_search on episodes using gin(search_vector);

Edge Function: dispatch processing

When a file lands in the podcasts bucket, this function creates two Ittybit tasks — one to normalize the audio and one to transcribe it.

// supabase/functions/process-episode/index.ts
import { serve } from "https://deno.land/std@0.177.0/http/server.ts";
import { createClient } from "https://esm.sh/@supabase/supabase-js@2";

serve(async (req) => {
const payload = await req.json();
const record = payload.record;

const supabase = createClient(
Deno.env.get("SUPABASE_URL")!,
Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!,
);

// Build the public URL for the uploaded file
const { data: urlData } = supabase.storage
.from(record.bucket_id)
.getPublicUrl(record.name);

const sourceUrl = urlData.publicUrl;
const title = record.name.replace(/\.[^.]+$/, "").replace(/[-_]/g, " ");

// Insert a pending episode row
const { data: episode } = await supabase
.from("episodes")
.insert({
title,
storage_path: `${record.bucket_id}/${record.name}`,
source_url: sourceUrl,
status: "processing",
})
.select()
.single();

const headers = {
Authorization: `Bearer ${Deno.env.get("ITTYBIT_API_KEY")}`,
"Content-Type": "application/json",
};

// Task 1: Normalize audio to MP3
const audioRes = await fetch("https://api.ittybit.com/jobs", {
method: "POST",
headers,
body: JSON.stringify({
input: sourceUrl,
kind: "audio",
options: {
format: "mp3",
quality: "high",
},
metadata: {
episode_id: episode.id,
callback_type: "audio",
},
}),
});
const audioTask = await audioRes.json();

// Task 2: Transcribe the audio
const transcriptRes = await fetch("https://api.ittybit.com/jobs", {
method: "POST",
headers,
body: JSON.stringify({
input: sourceUrl,
kind: "transcript",
metadata: {
episode_id: episode.id,
callback_type: "transcript",
},
}),
});
const transcriptTask = await transcriptRes.json();

// Store task IDs
await supabase
.from("episodes")
.update({
audio_task_id: audioTask.id,
transcript_task_id: transcriptTask.id,
})
.eq("id", episode.id);

return new Response(JSON.stringify({ ok: true }), {
headers: { "Content-Type": "application/json" },
});
});

# Test the audio processing task manually
curl -X POST https://api.ittybit.com/jobs \
  -H "Authorization: Bearer $ITTYBIT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "https://your-project.supabase.co/storage/v1/object/public/podcasts/ep-42.wav",
    "kind": "audio",
    "options": {
      "format": "mp3",
      "quality": "high"
    }
  }'

Wire up the database webhook

In the Supabase Dashboard, go to Database > Webhooks and create a new webhook:

Table: storage.objects
Events: INSERT
Type: Supabase Edge Function
Function: process-episode

You can filter to only the podcasts bucket by adding a condition on bucket_id.

Edge Function: receive Ittybit webhook

This function handles callbacks for both the audio and transcript tasks. It checks which type arrived via metadata.callback_type and updates the appropriate columns. Once both are done, the episode status moves to completed.

// supabase/functions/ittybit-webhook/index.ts
import { serve } from "https://deno.land/std@0.177.0/http/server.ts";
import { createClient } from "https://esm.sh/@supabase/supabase-js@2";

serve(async (req) => {
const payload = await req.json();

const supabase = createClient(
Deno.env.get("SUPABASE_URL")!,
Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!,
);

const episodeId = payload.metadata?.episode_id;
const callbackType = payload.metadata?.callback_type;
if (!episodeId || !callbackType) {
return new Response("Missing metadata", { status: 400 });
}

if (payload.status === "failed") {
await supabase
.from("episodes")
.update({ status: "failed", updated_at: new Date().toISOString() })
.eq("id", episodeId);
return new Response(JSON.stringify({ ok: true }), {
headers: { "Content-Type": "application/json" },
});
}

// Update the appropriate fields based on callback type
const updates: Record<string, unknown> = {
updated_at: new Date().toISOString(),
};

if (callbackType === "audio") {
updates.audio_url = payload.output?.url;
updates.duration_seconds = payload.output?.duration;
}

if (callbackType === "transcript") {
updates.transcript = payload.output?.text;
}

await supabase.from("episodes").update(updates).eq("id", episodeId);

// Check if both tasks are now complete
const { data: episode } = await supabase
.from("episodes")
.select("audio_url, transcript")
.eq("id", episodeId)
.single();

if (episode?.audio_url && episode?.transcript) {
await supabase
.from("episodes")
.update({ status: "completed" })
.eq("id", episodeId);
}

return new Response(JSON.stringify({ ok: true }), {
headers: { "Content-Type": "application/json" },
});
});

# Register your webhook endpoint in the Ittybit dashboard
# or via the API:
curl -X POST https://api.ittybit.com/webhooks \
  -H "Authorization: Bearer $ITTYBIT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-project.supabase.co/functions/v1/ittybit-webhook",
    "events": ["job.succeeded", "job.failed"]
  }'

Search endpoint

Create an Edge Function that queries the search_vector column. Postgres ts_rank sorts results by relevance, and ts_headline returns a snippet with matching terms highlighted.

// supabase/functions/search-episodes/index.ts
import { serve } from "https://deno.land/std@0.177.0/http/server.ts";
import { createClient } from "https://esm.sh/@supabase/supabase-js@2";

serve(async (req) => {
const { searchParams } = new URL(req.url);
const query = searchParams.get("q");
if (!query) {
return new Response(JSON.stringify({ episodes: [] }), {
headers: { "Content-Type": "application/json" },
});
}

const supabase = createClient(
Deno.env.get("SUPABASE_URL")!,
Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!,
);

const { data: episodes } = await supabase.rpc("search_episodes", {
search_query: query,
});

return new Response(JSON.stringify({ episodes }), {
headers: { "Content-Type": "application/json" },
});
});

# Search for episodes mentioning "kubernetes"
curl "https://your-project.supabase.co/functions/v1/search-episodes?q=kubernetes"

The search_episodes function lives in Postgres:

create or replace function search_episodes(search_query text)
returns table (
  id uuid,
  title text,
  audio_url text,
  duration_seconds numeric,
  headline text,
  rank real
) language sql as $$
  select
    e.id,
    e.title,
    e.audio_url,
    e.duration_seconds,
    ts_headline('english', e.transcript, plainto_tsquery('english', search_query),
      'StartSel=<mark>, StopSel=</mark>, MaxWords=35, MinWords=15'
    ) as headline,
    ts_rank(e.search_vector, plainto_tsquery('english', search_query)) as rank
  from episodes e
  where e.search_vector @@ plainto_tsquery('english', search_query)
    and e.status = 'completed'
  order by rank desc
  limit 20;
$$;

Deploy

supabase functions deploy process-episode
supabase functions deploy ittybit-webhook
supabase functions deploy search-episodes

Set your secrets:

supabase secrets set ITTYBIT_API_KEY=your_ittybit_api_key

SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY are available automatically in Edge Functions.

Test the pipeline

Upload an episode and watch it progress:

select id, title, status, audio_url is not null as has_audio,
       transcript is not null as has_transcript
from episodes
order by created_at desc
limit 5;

Once the status reads completed, search for something mentioned in the episode:

select title, ts_headline('english', transcript,
  plainto_tsquery('english', 'machine learning'),
  'StartSel=<mark>, StopSel=</mark>, MaxWords=35, MinWords=15'
) as snippet
from episodes
where search_vector @@ plainto_tsquery('english', 'machine learning')
order by ts_rank(search_vector, plainto_tsquery('english', 'machine learning')) desc;