Split audio by silence detection

View Markdown

โ€œPodstackโ€ (a podcast hosting platform like Transistor) lets creators upload full episode recordings and automatically split them into chapters at natural pauses.

API

ittybit audio \
  -i https://podstack-app.com/uploads/ep-201-raw.wav \
  --split silence \
  --silence_threshold -35 \
  --silence_min_duration 1.5 \
  --format mp3
const task = {
  input: 'https://podstack-app.com/uploads/ep-201-raw.wav',
  kind: 'audio',
  options: {
    split: 'silence',
    silence_threshold: -35,
    silence_min_duration: 1.5,
    format: 'mp3',
  },
};

const res = await fetch('https://api.ittybit.com/jobs', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${process.env.ITTYBIT_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify(task),
});
const data = await res.json();
import requests

task = {
    "input": "https://podstack-app.com/uploads/ep-201-raw.wav",
    "kind": "audio",
    "options": {
        "split": "silence",
        "silence_threshold": -35,
        "silence_min_duration": 1.5,
        "format": "mp3",
    },
}

res = requests.post(
    "https://api.ittybit.com/jobs",
    headers={"Authorization": f"Bearer {api_key}"},
    json=task,
)
data = res.json()
TASK='{
  "input": "https://podstack-app.com/uploads/ep-201-raw.wav",
  "kind": "audio",
  "options": {
    "split": "silence",
    "silence_threshold": -35,
    "silence_min_duration": 1.5,
    "format": "mp3"
  }
}'

curl -X POST https://api.ittybit.com/jobs \
  -H "Authorization: Bearer $ITTYBIT_API_KEY" \
  -H "Content-Type: application/json" \
  -d "$TASK"

The task returns multiple output files, one per detected segment. Each output includes start and end timestamps from the original recording.

CLI

ittybit audio \
  -i ep-201-raw.wav \
  -o chapters/ \
  --split silence \
  --silence_threshold -35 \
  --silence_min_duration 1.5

Output files are named sequentially: chapters/001.mp3, chapters/002.mp3, etc.

Silence threshold guide

The silence_threshold is in dBFS (decibels relative to full scale). Lower values are more permissive โ€” they require deeper silence to trigger a split.

Content typeThresholdMin durationNotes
Speech / podcast-35 to -30 dB1.5sSpeakers pause between topics
Audiobook-40 to -35 dB2.0sChapter breaks are longer, quieter
Live recording-25 to -20 dB1.0sBackground noise raises the floor
Music with gaps-50 to -40 dB3.0sOnly split on true silence between tracks

Start with -35 dB and 1.5 seconds for speech content. If you get too many splits, lower the threshold or increase the minimum duration.

Minimum segment length

Avoid tiny fragments by setting a minimum segment duration:

ittybit audio \
  -i ep-201-raw.wav \
  -o chapters/ \
  --split silence \
  --silence_threshold -35 \
  --silence_min_duration 1.5 \
  --min_segment 30

This discards any segment shorter than 30 seconds โ€” useful for filtering out coughs, mic bumps, or false positives.

Split and convert

Combine splitting with format conversion. Upload a WAV, get MP3 chapters:

ittybit audio \
  -i ep-201-raw.wav \
  -o chapters/ \
  --split silence \
  --silence_threshold -35 \
  --silence_min_duration 1.5 \
  --format mp3 \
  --quality high