AI Transcription & Summarization

⏱ Processing time
On this CPU-only server, expect roughly 1–3 minutes per minute of media.
Simple = faster. Differentiated (Beta) = slower (speaker separation).

Tip: keep clips short when testing. Use Simple first unless you really need speaker labels.

Media

🎥 Video Transcription

Simple: just transcript + summary. Differentiated: separates speakers when possible.

Video file

Best results with clear speech and .mp4 files.

Translations (optional)

Adds a translated summary and translated lines under each segment.

Video file

Use for multi-speaker content, interviews, podcasts.

Translations (optional)

Diarization + translations is the slowest path. Use only when needed.

Media

🎧 Audio Transcription

Works best with mono, clean speech. Music-heavy tracks will be worse.

Audio file

Recommended: .wav, 16kHz mono if possible.

Translations (optional)

Audio file

Use for meetings, podcasts, multi-speaker audio.

Translations (optional)

Text

📝 Text Summarization

Upload a .txt file to get a concise summary. Good for meeting notes, blog posts, etc.

Online

▶️ YouTube Ingest

Paste a full YouTube URL. Simple mode prefers captions; Differentiated runs diarization on the audio.

YouTube URL

Translations (optional)

Uses English captions when available, otherwise downloads and transcribes audio.

YouTube URL

Translations (optional)

Always downloads audio and runs diarization. Slowest, but best for multi-speaker videos.