AI Voice Agent Providers
Integration guide for OpenAI, ElevenLabs, and custom WebSocket endpoints
Overview
Telepath supports three types of AI voice agent providers, each suited to different use cases:
- OpenAI Realtime API — Low-latency conversational AI powered by GPT-4o
- ElevenLabs Conversational AI — High-quality voice synthesis with customizable agents
- Custom WebSocket — Bring your own AI backend via a standard WebSocket interface
OpenAI Realtime
The OpenAI Realtime API uses the gpt-4o-realtime-preview model to provide sub-200ms conversational responses over a WebSocket stream. It handles speech recognition, language understanding, and speech synthesis in a single integrated pipeline.
Info: The OpenAI Realtime API typically achieves response times under 200ms, making it one of the fastest AI voice solutions available.
Prerequisites
- An OpenAI account with access to the Realtime API
- An API key with
gpt-4o-realtime-previewmodel access enabled - Sufficient OpenAI credits or a paid plan
Getting Your OpenAI API Key
Log in to platform.openai.com
Navigate to API Keys in the left sidebar
Click Create new secret key and give it a descriptive name (e.g.,
telepath-prod)Copy the generated key immediately — it will not be shown again
Confirm your account has access to the
gpt-4o-realtime-previewmodel under Model access
Configuration in Telepath
In the Telepath Dashboard, create or edit a connection
Select OpenAI Realtime API as the provider
Paste your OpenAI API key
Optionally add a system prompt to define your agent's role
Example System Prompt
You are a friendly customer service agent for Acme Corp. Your name is Alex.
Help customers with order inquiries, returns, and product questions.
Always be polite, concise, and professional.
If you cannot answer a question, offer to escalate to a human agent.
Pricing
OpenAI Realtime API is billed per minute of audio processed. Refer to the OpenAI pricing page for current rates. Costs are separate from Telepath's own pricing.
Best Practices
- Keep system prompts concise — shorter prompts reduce time-to-first-token latency
- Use G.722 codec for better audio quality and improved speech recognition accuracy
- Monitor AI latency in the Telepath dashboard to detect degradation
- Rotate your OpenAI API key periodically and update the Telepath connection
ElevenLabs Conversational AI
ElevenLabs Conversational AI combines ultra-realistic voice synthesis with a configurable agent system. Agents can be trained on custom knowledge bases and configured with specific personas and behaviors.
Tip: ElevenLabs agents can be trained on custom knowledge bases, making them ideal for domain-specific use cases such as healthcare, legal, or technical support.
Prerequisites
- An ElevenLabs account (Creator plan or higher for Conversational AI)
- An ElevenLabs API key
- At least one Conversational AI agent created in the ElevenLabs dashboard
Getting Your ElevenLabs API Key
Log in to elevenlabs.io
Click your profile icon in the top-right corner
Select Profile & API Key from the dropdown
Copy your API key from the API Key section
Creating an Agent in ElevenLabs
-
Navigate to Conversational AI → Agents in the ElevenLabs dashboard
-
Click Create Agent and configure:
- System Prompt: Your agent's persona and instructions
- First Message: What the agent says when a call is answered
- Knowledge Base: Upload documents or URLs for the agent to reference
- Voice: Choose from ElevenLabs' library of voices
Save the agent and copy the Agent ID from the agent's detail page
Optionally note the Voice ID if you want to specify a voice explicitly in Telepath
Configuration in Telepath
In the Telepath Dashboard, create or edit a connection
Select ElevenLabs Conversational AI as the provider
Enter your ElevenLabs API key and Agent ID
Optionally enter a Voice ID to override the agent's default voice
Voice Selection
ElevenLabs offers a wide range of voices suited to different contexts:
- Professional: Calm, authoritative voices ideal for corporate or healthcare use
- Casual: Warm, conversational voices for retail or consumer applications
- Accents: Regional accents (British, Australian, American, etc.) for localized deployments
Pricing
ElevenLabs Conversational AI is billed per minute of conversation. See the ElevenLabs pricing page for current rates.
Best Practices
- Use a concise first message to minimize time-to-first-audio
- Keep your agent's knowledge base focused — large documents increase lookup latency
- Test different voices with real users before production deployment
- Enable G.722 codec to take full advantage of ElevenLabs' 16kHz audio output
Custom WebSocket
The Custom WebSocket provider lets you connect any AI backend that implements the Telepath audio WebSocket protocol. This is the most flexible option and supports any speech recognition, language model, or synthesis stack.
WebSocket Protocol Requirements
- Your endpoint must accept WebSocket connections over
wss://(TLS required in production) - Your server must handle binary audio frames in the Telepath audio format
- Your server must respond with binary audio frames within the latency budget
- Your server must handle connection lifecycle events (open, close, error)
Audio Format
- Encoding: Raw PCM (linear16)
- Sample Rate: 8kHz (G.711) or 16kHz (G.722)
- Channels: Mono
- Bit Depth: 16-bit signed little-endian
Connection Flow
Telepath receives an inbound SIP call and opens a WebSocket connection to your endpoint
Telepath sends a JSON
session.startmessage with call metadata (caller ID, connection ID, codec)Telepath begins streaming binary audio frames (caller audio) to your endpoint
Your endpoint processes the audio and streams binary audio frames (AI response audio) back to Telepath
When the call ends, Telepath sends a JSON
session.endmessage and closes the WebSocket connection
Implementation Example
import asyncio
import websockets
import json
async def handle_call(websocket, path):
"""Handle an incoming Telepath WebSocket connection."""
async for message in websocket:
if isinstance(message, str):
# Handle JSON control messages
data = json.loads(message)
if data["type"] == "session.start":
print(f"Call started: {data['call_id']}")
print(f"Codec: {data['codec']}")
print(f"Caller: {data['caller_id']}")
else:
# Handle binary audio frames
# message is raw PCM audio from the caller
audio_response = await your_ai_pipeline(message)
await websocket.send(audio_response)
async def your_ai_pipeline(audio_bytes):
"""Process audio through your AI stack and return response audio."""
# 1. Speech-to-text
# 2. LLM inference
# 3. Text-to-speech
# Return raw PCM audio bytes
pass
start_server = websockets.serve(handle_call, "0.0.0.0", 8765)
asyncio.get_event_loop().run_until_complete(start_server)
asyncio.get_event_loop().run_forever()
Configuration in Telepath
In the Telepath Dashboard, create or edit a connection
Select Custom WebSocket as the provider
Enter your public
wss://endpoint URLOptionally add custom HTTP headers for authentication (e.g.,
Authorization: Bearer <token>)
Error Handling
Your WebSocket endpoint should handle these error conditions gracefully:
- Connection timeout: If Telepath cannot connect within 5 seconds, the call will be rejected
- Audio processing errors: If your endpoint fails to respond, the call will experience silence
- WebSocket disconnect: Telepath will attempt to reconnect up to 3 times before ending the call
Warning: Your WebSocket endpoint must use wss:// (TLS). Plain ws:// connections are not supported in production to ensure audio confidentiality.
Performance Optimization
- Deploy your WebSocket server in the same region as your Telepath connection to minimize RTT
- Use streaming speech-to-text and text-to-speech to begin responding before full utterance detection
- Process audio in chunks of 20–40ms for optimal balance between latency and processing overhead
- Use connection keep-alive to avoid cold-start penalties on new calls
Comparison
| Feature | OpenAI | ElevenLabs | Custom |
|---|---|---|---|
| Setup Difficulty | Easy | Easy | Complex |
| Cost | Medium | Medium–High | Variable |
| Customization | Moderate | High | Full control |
| Response Time | <200ms | 200–500ms | Varies |
| Reliability | High | High | Depends on impl. |
| Language Support | 100+ languages | 30+ languages | Your choice |
Switching Providers
You can switch an existing connection to a different provider without interrupting your SIP trunk configuration. Follow these steps to migrate safely:
Create a new Telepath connection with the target provider configured
Test the new connection thoroughly using a test phone number
Update your carrier's origination URI to point to the new connection's SIP username
Monitor call quality for 24–48 hours after the switch
Once satisfied, deactivate and delete the old connection
Info: You can run multiple connections in parallel using different SIP usernames. This makes it safe to A/B test providers without any downtime or risk to production traffic.
Troubleshooting
Connection Fails
- Verify your API key is valid and not expired
- Confirm your account has access to the required model or plan tier
- For custom WebSocket: check that your endpoint is publicly accessible and accepts
wss://connections
Poor Audio Quality
- Switch from G.711 to G.722 if your provider supports wideband audio
- Check network latency between Telepath and your AI provider's servers
- Ensure your carrier is sending clean audio without excessive packet loss
High Latency
- Review per-component latency in the Telepath dashboard's call detail view
- If AI Latency is elevated, consider switching providers or upgrading your plan tier
- For custom WebSocket: profile your audio processing pipeline for bottlenecks
For comprehensive troubleshooting steps, see the Troubleshooting guide.