Ittybit

Overview

The speech task transcribes spoken audio from a video or audio file into structured text with speaker identification, language detection, and time-aligned segments.

Creating a speech task

curl -X POST "https://api.ittybit.com/tasks" \
-H "Authorization: Bearer ITTYBIT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "url": "https://example.com/video.mp4",
  "kind": "speech"
}'

The speech task has no configurable options — all analysis is returned by default.

Output

When the task succeeds, the output includes:

Field	Type	Description
`text`	`string[]`	Transcript text segments
`languages`	`string[]`	Detected languages (ISO 639-1)
`speakers`	`integer`	Number of distinct speakers detected
`timeline`	`array`	Time-coded transcript segments with `start`, `end`, `text`, and `speaker`

{
  "id": "task_abcdefgh12345678",
  "object": "task",
  "kind": "speech",
  "status": "succeeded",
  "inputs": [...],
  "outputs": [...],
  "created_at": 1735689825,
  "updated_at": 1735689886
}

The output JSON file contains the full transcript data:

{
  "kind": "speech",
  "text": [
    "Hello, and welcome to UkeTube. I'm Jesse Doe.",
    "And I'm John Doe. Today we're going to be learning Sandstorm by Darude."
  ],
  "languages": ["en"],
  "speakers": 2,
  "timeline": [
    {
      "start": 12.00,
      "end": 14.50,
      "speaker": 0,
      "text": "Hello, and welcome to UkeTube. I'm Jesse Doe."
    },
    {
      "start": 14.80,
      "end": 18.28,
      "speaker": 1,
      "text": "And I'm John Doe. Today we're going to be learning Sandstorm by Darude."
    }
  ]
}

Supported inputs

Speech tasks work with:

Audio files (.mp3, .m4a, .wav, .ogg, .aac, .flac)
Video files with embedded audio (.mp4, .mov, .webm)

Common use cases

Video and podcast transcription
Generating subtitles or captions
Searchable transcripts and AI summaries
Creating text-based chapter markers

Speech

Overview

Creating a speech task

Output

Supported inputs

Common use cases

Subtitles

Summary

Outline

On this page

Speech

Overview

Creating a speech task

Output

Supported inputs

Common use cases

Related

Subtitles

Summary

Outline

On this page