v1.1 · GPU Accelerated

WhisperX

Audio transcription API with word-level timestamps and speaker diarization.


Primary Endpoint
POST /transcribe

Upload an audio file and get back a full transcript with timestamps and speaker labels. Returns a job ID for async polling, or block with /transcribe/sync.

ParameterDefaultDescription
filerequiredAudio file to transcribe
modellarge-v3Whisper model — tiny, base, small, medium, large-v2, large-v3
languageautoISO 639-1 code (en, de, …) or leave empty to auto-detect
speaker_countautoExpected number of speakers (1–20). Helps diarization accuracy
enable_diarizationtrueIdentify and label individual speakers
response_formatverbose_jsonverbose_json with segments, or json for plain text
.mp3 .wav .m4a .ogg .webm .mp4 .aac .wma
Quick Start
curl
# async — returns job ID to poll
curl -X POST "https://transcription-api-v2.lmparsing.cloud/transcribe" \
  -H "X-Api-Key: YOUR_KEY" \
  -F "file=@interview.mp3"

# synchronous — blocks until done
curl -X POST "https://transcription-api-v2.lmparsing.cloud/transcribe/sync" \
  -H "X-Api-Key: YOUR_KEY" \
  -F "file=@meeting.wav" \
  -F "model=large-v3"

# poll job status
curl "https://transcription-api-v2.lmparsing.cloud/jobs/{job_id}" \
  -H "X-Api-Key: YOUR_KEY"
Capabilities

GPU Accelerated

Batched inference on NVIDIA GPUs with VRAM-aware scheduling.

Speaker Diarization

Identifies who said what using pyannote speaker segmentation.

Word Timestamps

Precise start and end times for every word in the transcript.

6 Model Sizes

From tiny (39M) for speed to large-v3 (1.5B) for accuracy.

Async Job Queue

Submit and poll, or block for result. Backpressure and VRAM tracking built in.

Auto Language

Detects the spoken language automatically, or pin it with a hint.