AI Transcription & Summarization

Upload video, audio, or text — or paste a YouTube link — to get transcripts and summaries.

⏱ Processing time
On this CPU-only server, expect roughly 1–3 minutes per minute of media.
Simple = faster. Differentiated (Beta) = slower (speaker separation).
Tip: keep clips short when testing. Use Simple first unless you really need speaker labels.
Media
🎥 Video Transcription

Simple: just transcript + summary. Differentiated: separates speakers when possible.

Best results with clear speech and .mp4 files.
Translations (optional)
Adds a translated summary and translated lines under each segment.
Use for multi-speaker content, interviews, podcasts.
Translations (optional)
Diarization + translations is the slowest path. Use only when needed.
Media
🎧 Audio Transcription

Works best with mono, clean speech. Music-heavy tracks will be worse.

Recommended: .wav, 16kHz mono if possible.
Translations (optional)
Use for meetings, podcasts, multi-speaker audio.
Translations (optional)
Text
📝 Text Summarization

Upload a .txt file to get a concise summary. Good for meeting notes, blog posts, etc.

Online
▶️ YouTube Ingest

Paste a full YouTube URL. Simple mode prefers captions; Differentiated runs diarization on the audio.

Translations (optional)
Uses English captions when available, otherwise downloads and transcribes audio.
Translations (optional)
Always downloads audio and runs diarization. Slowest, but best for multi-speaker videos.