Comprehensive guide to understanding latencies across ASR, AI models, and TTS systems in Vodex
Metric | Value |
---|---|
Base Latency | 250ms |
Processing Type | Streaming |
Best For | Real-time conversations, interactive scenarios |
Google ASR
Metric | Value |
---|---|
Processing Latency | 2 seconds |
VAD Time | 300ms (fixed) |
Total Latency | ~2.3 seconds |
OpenAI Whisper
Metric | Value |
---|---|
Processing Latency | 700ms |
VAD Time | 300ms (fixed) |
Total Latency | ~1 second |
ElevenLabs Whisper
Metric | Value |
---|---|
Processing Latency | 400ms |
VAD Time | 300ms (fixed) |
Total Latency | ~700ms |
Azure Speech Services
Metric | Value |
---|---|
Processing Latency | 800ms |
VAD Time | 300ms (fixed) |
Total Latency | ~1.1 seconds |
Genesis Echo
Metric | Value |
---|---|
Processing Latency | 400ms |
VAD Time | 300ms (fixed) |
Total Latency | ~700ms |
Model | First Token Latency | Status | Best For |
---|---|---|---|
Spark | 1.5s | ✅ Stable | Complex reasoning, detailed conversations |
Spark Flash | 400ms | ✅ Stable | Balanced performance, standard interactions |
Spark Flash Lite | 200ms | ✅ Stable | Quick responses, simple tasks |
GPT-4 Series
Model | First Token Latency | Status | Characteristics |
---|---|---|---|
GPT-4o | 800ms | ✅ Stable | Multimodal capabilities, advanced reasoning |
GPT-4o Mini | 500ms | ✅ Stable | Efficient multimodal processing |
GPT-4.1 Mini | 500ms | ✅ Stable | Enhanced efficiency, cost-effective |
GPT-5 Series (Next Generation)
Model | First Token Latency | Status | Characteristics |
---|---|---|---|
GPT-5 Nano | 350ms | ⚠️ Latency issues | Ultra-efficient processing, limited context |
GPT-5 Mini | 200ms | ⚠️ Latency issues | Fast responses, simplified reasoning |
GPT-5 Chat | 500ms | ✅ Stable | Optimized for conversations |
Specialized Models
Model | First Token Latency | Status | Characteristics |
---|---|---|---|
OpenAI GSS | 1.2s | ✅ Stable | Specialized processing, high accuracy |
Model | First Token Latency | Status | Characteristics |
---|---|---|---|
Llama 3.3 70B | 450ms | ✅ Stable | Large parameter model, excellent reasoning |
Llama 4 Maverick | 450ms | ✅ Stable | Next-generation open source |
Model | First Chunk Latency | Quality | Best For |
---|---|---|---|
Turbo 2 | 250ms | High | Professional conversations, customer service |
Turbo 2.5 | 250ms | Enhanced | Premium voice quality, sales calls |
RimeLabs
Metric | Value |
---|---|
First Chunk Latency | 350ms |
Quality | High |
Availability | Contact support |
Google UR Realistic
Metric | Value |
---|---|
First Chunk Latency | 400ms |
Quality | Ultra-realistic |
Availability | Contact support |
Azure Cognitive Services
Metric | Value |
---|---|
First Chunk Latency | 800ms |
Quality | Enterprise-grade |
Availability | Contact support |
Baseline Measurement
Optimization Testing
Continuous Monitoring
ASR Optimization
Model Selection
TTS Configuration