General-purpose speech recognition model. Transcribes and translates audio to text in multiple languages.
Specifications
Context
128K
Inputaudio
Outputtext
Performance (7-day Average)
Collecting…
Collecting…
Collecting…
Pricing
Input Audio$0.01/Minutes
Availability Trend (24h)
Performance Metrics (24h)
Similar Models
$0.17/$0.66/M
ctx128Kmax16Kavail—tps—
InOutCap
A cost-efficient audio-capable model that accepts text, audio, and image inputs and can generate text and audio outputs.
$0.17/$0.66/M
ctx128Kmax16Kavail—tps—
InOutCap
A cost-efficient audio-capable model that accepts text, audio, and image inputs and can generate text and audio outputs.
$2.75/$11.00/M
ctx128Kmax16Kavail—tps—
InOutCap
GPT-4o with native audio input and output capabilities for real-time speech-to-speech conversations.
$2.75/$11.00/M
ctx128Kmax16Kavail—tps—
InOutCap
GPT-4o with native audio input and output capabilities for real-time speech-to-speech conversations.