What you’ll learn: Detailed breakdown of latencies across all Vodex system components including ASR, AI models, and TTS providers with performance characteristics and optimization notes.
Understanding latency is crucial for optimizing your AI calling campaigns. This guide provides comprehensive latency information for all system components to help you make informed decisions about your configuration.

ASR (Automatic Speech Recognition) Latencies

Speech recognition is the first step in processing user input. Different ASR providers offer varying latency characteristics based on their processing approach.

Streaming ASRs

Real-time speech recognition that processes audio as it’s being spoken:
Ultra-Low Latency Processing
MetricValue
Base Latency250ms
Processing TypeStreaming
Best ForReal-time conversations, interactive scenarios
Characteristics:
  • Fastest streaming ASR option
  • Optimized for conversational AI
  • Minimal delay in speech recognition
Streaming ASR Timing: For streaming ASRs, we implement adaptive timing - either 300ms OR 1 second depending on user activity to prevent cutting off users mid-sentence.

Non-Streaming ASRs

Batch processing ASRs that analyze complete audio segments:
Non-Streaming ASR Note: All non-streaming ASRs include a fixed Voice Activity Detection (VAD) time of 300ms to ensure complete speech capture before processing.

AI Model Latencies

Time to first token generation - the critical metric for conversational responsiveness.
Important: Listed latencies represent time to first token. Complete response generation typically adds +300ms from the first token delivery.
Latency vs Capability Trade-off: Lower latency models are typically smaller LLMs that cannot handle longer prompts, complex rules, or extensive context. Choose based on your specific use case requirements.

Vodex Optimized Models

Specially tuned models optimized for voice conversations:
Google Gemini-Based Architecture
ModelFirst Token LatencyStatusBest For
Spark1.5s✅ StableComplex reasoning, detailed conversations
Spark Flash400ms✅ StableBalanced performance, standard interactions
Spark Flash Lite200ms✅ StableQuick responses, simple tasks
Performance Characteristics:
  • Optimized for voice conversations
  • Excellent reasoning capabilities
  • Consistent performance across scenarios

OpenAI Models

Industry-leading AI models with varying performance characteristics:

Open Source & Alternative Models

High-performance alternatives to proprietary models:
Open Source Excellence
ModelFirst Token LatencyStatusCharacteristics
Llama 3.3 70B450ms✅ StableLarge parameter model, excellent reasoning
Llama 4 Maverick450ms✅ StableNext-generation open source
Advantages:
  • Open source flexibility
  • Privacy-focused processing
  • Customizable implementations
  • Cost-effective scaling

TTS (Text-to-Speech) Latencies

Time to first audio chunk - critical for maintaining conversation flow.
TTS Streaming: All TTS providers support streaming, allowing audio playback to begin as soon as the first chunk is available, reducing perceived latency.

ElevenLabs (Default Provider)

Premium AI voice synthesis with multiple performance tiers:
High-Quality Voice Synthesis
ModelFirst Chunk LatencyQualityBest For
Turbo 2250msHighProfessional conversations, customer service
Turbo 2.5250msEnhancedPremium voice quality, sales calls
Characteristics:
  • Premium voice quality
  • Natural emotional expression
  • Multiple voice personalities
  • Excellent for professional use
ElevenLabs Selection: Use Flash series for real-time interactions where speed is critical, and Turbo series for professional scenarios where voice quality is paramount.

Alternative TTS Providers

Additional voice synthesis options for specialized needs:

Latency Optimization Strategies

Configuration Recommendations

Ultra-Low Latency SetupRecommended Configuration:
  • ASR: Alpha Echo V2 (250ms)
  • Model: Spark Flash Lite (200ms) or GPT-5 Mini (200ms)*
  • TTS: ElevenLabs Flash 2.5 (95ms)
Total Pipeline Latency: ~545ms
Limitations: Ultra-low latency models cannot handle complex prompts, extensive context, or sophisticated rules. Best for simple, straightforward interactions only.
*GPT-5 Mini currently experiencing latency issues

Performance Monitoring

1

Baseline Measurement

Establish Performance Baselines
  • Monitor end-to-end conversation latency
  • Track component-specific performance
  • Document peak and average response times
2

Optimization Testing

A/B Test Configurations
  • Compare different ASR providers
  • Test various AI model combinations
  • Evaluate TTS provider performance
3

Continuous Monitoring

Ongoing Performance Tracking
  • Set up latency alerts
  • Monitor for performance degradation
  • Track user experience metrics

Technical Considerations

Latency Factors

Network Latency

External Factors
  • Geographic distance to servers
  • Network congestion and routing
  • Internet service provider performance
  • CDN and edge server optimization

Processing Load

System Performance
  • Server capacity and utilization
  • Concurrent request handling
  • Model loading and initialization
  • Resource allocation efficiency

Audio Quality

Input Characteristics
  • Audio sample rate and quality
  • Background noise levels
  • Speaker clarity and volume
  • Connection stability

Best Practices


Next Steps

Ready to optimize your latency? Use this comprehensive guide to select the optimal configuration for your specific use case and performance requirements.
Recommended Actions:
  1. Assess your requirements - Determine if you need real-time, balanced, or high-quality processing
  2. Test configurations - Experiment with different component combinations
  3. Monitor performance - Track latency metrics in production
  4. Optimize iteratively - Continuously refine based on real-world performance
Need help with configuration? Contact support@vodex.ai for personalized latency optimization assistance.

Configure these latency-related settings in your Vodex dashboard:

ASR Configuration

Speech Recognition SettingsConfigure ASR providers and settings in Call Settings and Advanced Settings.
  • Choose between streaming and non-streaming ASR
  • Configure language codes and detection
  • Set up custom ASR parameters

AI Model Selection

Model ConfigurationSelect and configure AI models in Call Settings.
  • Choose from Vodex optimized models
  • Configure OpenAI, Llama, and other providers
  • Balance latency vs capability requirements

Voice/TTS Settings

Text-to-Speech ConfigurationConfigure voice providers and settings in Call Settings.
  • Select ElevenLabs Turbo or Flash series
  • Configure alternative TTS providers
  • Optimize voice quality vs speed