Codec & VAD
Audio codec selection and voice activity detection configuration
Audio Codecs
Telepath supports two audio codecs for SIP transport. The choice between them affects audio quality, bandwidth, and compatibility.
G.711 (Narrowband)
Technical Specifications:
- Sampling Rate: 8 kHz
- Bit Rate: 64 kbps
- Bandwidth: 8 kHz
- Latency: ~1–2ms encoding
- Variants: PCMU (µ-law, common in North America) and PCMA (A-law, common in Europe and Asia)
Best For: Legacy carriers with limited codec support, cost-sensitive applications, regions where PCMU/PCMA is standard, or when bandwidth is extremely limited.
G.722 (Wideband / HD Voice)
Technical Specifications:
- Sampling Rate: 16 kHz
- Bit Rate: 64 kbps (ADPCM compressed)
- Bandwidth: 16 kHz (super-wideband)
- Latency: ~1–2ms encoding
Benefits: Double the frequency range of G.711 means clearer, more natural sound; AI agents understand speech better; better speaker recognition; same bandwidth as G.711.
Recommendation: Use G.722 for all AI voice applications. The wider frequency range helps AI models understand speech better while using the same bandwidth as G.711.
Codec Selection Guide
By Carrier
| Carrier | Recommended | Fallback |
|---|---|---|
| Twilio | G.722 | G.711 PCMU |
| Telnyx | G.722 | G.711 PCMU |
| Vonage | G.722 | G.711 PCMU |
| Bandwidth | G.711 PCMU | G.722 |
| SignalWire | G.722 | G.711 PCMU |
| Plivo | G.722 | G.711 PCMU |
By AI Provider
| AI Provider | Recommended | Why |
|---|---|---|
| OpenAI Realtime | G.722 | Wider frequency range aids speech recognition |
| ElevenLabs | G.722 | Clearer input improves response quality |
| Custom WebSocket | G.722 | Better for most AI models |
Codec Negotiation
Telepath automatically negotiates the best available codec with your carrier, preferring G.722 and falling back gracefully to G.711 if unavailable. No manual selection is needed.
Voice Activity Detection (VAD)
VAD automatically detects when someone is speaking and handles silence intelligently, enabling natural conversation flow and barge-in support.
Benefits
- Natural conversation flow — AI knows when to start listening and when the caller has finished speaking
- Barge-in support — callers can interrupt the AI agent mid-sentence; VAD detects it instantly
- Reduced latency — real-time turn-boundary detection; no fixed timeouts
- Background noise filtering — distinguishes speech from ambient noise for cleaner audio
How Telepath’s VAD Works
Outbound VAD (AI Agent Speaking)
- AI finishes speaking; silence is detected
- VAD identifies end-of-turn
- System switches to listening mode
- VAD detects incoming caller audio
- Audio is forwarded to the AI agent for processing
Inbound VAD (Caller Speaking)
- Caller speaks; VAD detects audio energy above threshold
- Audio is collected in real-time
- If AI is currently speaking, barge-in is triggered
- Audio is forwarded to the AI immediately
- AI agent handles the interruption
VAD Parameters
Telepath uses intelligent defaults but you can fine-tune behavior:
- End-of-Turn Timeout — default 800ms of silence; adjustable 400–2000ms. Lower = more aggressive (interrupts sooner); higher = more patient.
- Speech Start Threshold — default automatic. Controls how quickly VAD registers speech start.
- Noise Level Adaptation — default enabled. Adjusts sensitivity to ambient environment.
Configuring VAD
Via Dashboard
Open your connection settings in the Telepath Dashboard
Go to Advanced → VAD Configuration
Adjust end-of-turn timeout, sensitivity level, and noise adaptation
Save and test with a live call
Via API
{
"vad_config": {
"end_of_turn_timeout_ms": 800,
"sensitivity": "adaptive",
"noise_adaptation": true,
"min_speech_duration_ms": 100
}
}
Testing VAD Settings
Test Natural Pauses
- Call your agent and speak, then pause for 1–2 seconds
- Observe whether the agent responds at the right moment
- Increase
end_of_turn_timeout_msif the agent interrupts too soon; decrease if responses feel sluggish
Test Interruptions
- Let the AI agent speak a full sentence
- Interrupt mid-sentence
- Verify the agent immediately receives your speech and stops speaking
Test Noise Handling
- Call from a noisy environment
- Verify the agent still understands your speech
- If spurious interruptions occur, enable noise adaptation or increase the speech start threshold
Audio Quality Optimization
Best Practices
- Network — use wired connections when possible; target packet loss <1% and jitter <50ms
- Carrier — enable G.722 if available; use UDP or TLS (both work); optimize for your region
- AI Agent — use the latest model versions; keep API credentials current; test with a variety of speakers
- Monitoring — check the dashboard codec report; review VAD decisions in SIP traces; track audio quality metrics
Troubleshooting Audio Issues
Poor Clarity
- Confirm which codec is in use (check dashboard)
- Switch to G.722 if currently on G.711
- Verify AI provider credentials and model version
- Test with different phone hardware
Frequent Spurious Interruptions
- Increase
end_of_turn_timeout_ms(try 1000–1200ms) - Enable noise adaptation
- Test from a quieter environment
Delayed Responses
- Check AI provider latency in the dashboard metrics breakdown
- Verify codec negotiation completed successfully
- Review carrier-side network conditions
Advanced Codec Topics
Codec Transcoding
If your carrier only supports G.711 but you want G.722’s quality benefits, Telepath can transcode between formats with a minimal latency impact (~5–10ms). Alternatively, request that your carrier enable G.722 or switch to a carrier that supports it natively.
Performance Metrics
Monitor codec performance in the dashboard:
- Codec Used — which codec was actually negotiated for each call
- Packet Loss — percentage of lost RTP packets
- Jitter — audio timing variance in milliseconds
- MOS Score — Mean Opinion Score; an objective audio quality estimate
FAQ
Should I always use G.722?
Yes, if your carrier supports it. Use G.711 only when required for compatibility with legacy infrastructure.
Can I change codec mid-call?
No. Codec is negotiated at call setup. To change, the call must be terminated and re-established.
How does VAD handle music or hold tones?
Adaptive VAD learns from conversation patterns and typically handles hold music appropriately, but you may need to tune the sensitivity if you experience false triggers.
Can I disable VAD?
VAD is essential for natural, low-latency conversation. Disabling it is not recommended and not supported in standard configurations.