AI Voice Agent Providers

Integration guide for OpenAI, ElevenLabs, and custom WebSocket endpoints

Overview

Telepath supports three types of AI voice agent providers, each suited to different use cases:

  • OpenAI Realtime API — Low-latency conversational AI powered by GPT-4o
  • ElevenLabs Conversational AI — High-quality voice synthesis with customizable agents
  • Custom WebSocket — Bring your own AI backend via a standard WebSocket interface

OpenAI Realtime

The OpenAI Realtime API uses the gpt-4o-realtime-preview model to provide sub-200ms conversational responses over a WebSocket stream. It handles speech recognition, language understanding, and speech synthesis in a single integrated pipeline.

Info: The OpenAI Realtime API typically achieves response times under 200ms, making it one of the fastest AI voice solutions available.

Prerequisites

  • An OpenAI account with access to the Realtime API
  • An API key with gpt-4o-realtime-preview model access enabled
  • Sufficient OpenAI credits or a paid plan

Getting Your OpenAI API Key

  1. Navigate to API Keys in the left sidebar

  2. Click Create new secret key and give it a descriptive name (e.g., telepath-prod)

  3. Copy the generated key immediately — it will not be shown again

  4. Confirm your account has access to the gpt-4o-realtime-preview model under Model access

Configuration in Telepath

  1. In the Telepath Dashboard, create or edit a connection

  2. Select OpenAI Realtime API as the provider

  3. Paste your OpenAI API key

  4. Optionally add a system prompt to define your agent's role

Example System Prompt

text
You are a friendly customer service agent for Acme Corp. Your name is Alex.
Help customers with order inquiries, returns, and product questions.
Always be polite, concise, and professional.
If you cannot answer a question, offer to escalate to a human agent.

Pricing

OpenAI Realtime API is billed per minute of audio processed. Refer to the OpenAI pricing page for current rates. Costs are separate from Telepath's own pricing.

Best Practices

  • Keep system prompts concise — shorter prompts reduce time-to-first-token latency
  • Use G.722 codec for better audio quality and improved speech recognition accuracy
  • Monitor AI latency in the Telepath dashboard to detect degradation
  • Rotate your OpenAI API key periodically and update the Telepath connection

ElevenLabs Conversational AI

ElevenLabs Conversational AI combines ultra-realistic voice synthesis with a configurable agent system. Agents can be trained on custom knowledge bases and configured with specific personas and behaviors.

Tip: ElevenLabs agents can be trained on custom knowledge bases, making them ideal for domain-specific use cases such as healthcare, legal, or technical support.

Prerequisites

  • An ElevenLabs account (Creator plan or higher for Conversational AI)
  • An ElevenLabs API key
  • At least one Conversational AI agent created in the ElevenLabs dashboard

Getting Your ElevenLabs API Key

  1. Log in to elevenlabs.io

  2. Click your profile icon in the top-right corner

  3. Select Profile & API Key from the dropdown

  4. Copy your API key from the API Key section

Creating an Agent in ElevenLabs

  1. Navigate to Conversational AI → Agents in the ElevenLabs dashboard

  2. Click Create Agent and configure:

    • System Prompt: Your agent's persona and instructions
    • First Message: What the agent says when a call is answered
    • Knowledge Base: Upload documents or URLs for the agent to reference
    • Voice: Choose from ElevenLabs' library of voices
  3. Save the agent and copy the Agent ID from the agent's detail page

  4. Optionally note the Voice ID if you want to specify a voice explicitly in Telepath

Configuration in Telepath

  1. In the Telepath Dashboard, create or edit a connection

  2. Select ElevenLabs Conversational AI as the provider

  3. Enter your ElevenLabs API key and Agent ID

  4. Optionally enter a Voice ID to override the agent's default voice

Voice Selection

ElevenLabs offers a wide range of voices suited to different contexts:

  • Professional: Calm, authoritative voices ideal for corporate or healthcare use
  • Casual: Warm, conversational voices for retail or consumer applications
  • Accents: Regional accents (British, Australian, American, etc.) for localized deployments

Pricing

ElevenLabs Conversational AI is billed per minute of conversation. See the ElevenLabs pricing page for current rates.

Best Practices

  • Use a concise first message to minimize time-to-first-audio
  • Keep your agent's knowledge base focused — large documents increase lookup latency
  • Test different voices with real users before production deployment
  • Enable G.722 codec to take full advantage of ElevenLabs' 16kHz audio output

Custom WebSocket

The Custom WebSocket provider lets you connect any AI backend that implements the Telepath audio WebSocket protocol. This is the most flexible option and supports any speech recognition, language model, or synthesis stack.

WebSocket Protocol Requirements

  • Your endpoint must accept WebSocket connections over wss:// (TLS required in production)
  • Your server must handle binary audio frames in the Telepath audio format
  • Your server must respond with binary audio frames within the latency budget
  • Your server must handle connection lifecycle events (open, close, error)

Audio Format

  • Encoding: Raw PCM (linear16)
  • Sample Rate: 8kHz (G.711) or 16kHz (G.722)
  • Channels: Mono
  • Bit Depth: 16-bit signed little-endian

Connection Flow

  1. Telepath receives an inbound SIP call and opens a WebSocket connection to your endpoint

  2. Telepath sends a JSON session.start message with call metadata (caller ID, connection ID, codec)

  3. Telepath begins streaming binary audio frames (caller audio) to your endpoint

  4. Your endpoint processes the audio and streams binary audio frames (AI response audio) back to Telepath

  5. When the call ends, Telepath sends a JSON session.end message and closes the WebSocket connection

Implementation Example

python
import asyncio
import websockets
import json

async def handle_call(websocket, path):
    """Handle an incoming Telepath WebSocket connection."""
    async for message in websocket:
        if isinstance(message, str):
            # Handle JSON control messages
            data = json.loads(message)
            if data["type"] == "session.start":
                print(f"Call started: {data['call_id']}")
                print(f"Codec: {data['codec']}")
                print(f"Caller: {data['caller_id']}")
        else:
            # Handle binary audio frames
            # message is raw PCM audio from the caller
            audio_response = await your_ai_pipeline(message)
            await websocket.send(audio_response)

async def your_ai_pipeline(audio_bytes):
    """Process audio through your AI stack and return response audio."""
    # 1. Speech-to-text
    # 2. LLM inference
    # 3. Text-to-speech
    # Return raw PCM audio bytes
    pass

start_server = websockets.serve(handle_call, "0.0.0.0", 8765)
asyncio.get_event_loop().run_until_complete(start_server)
asyncio.get_event_loop().run_forever()

Configuration in Telepath

  1. In the Telepath Dashboard, create or edit a connection

  2. Select Custom WebSocket as the provider

  3. Enter your public wss:// endpoint URL

  4. Optionally add custom HTTP headers for authentication (e.g., Authorization: Bearer <token>)

Error Handling

Your WebSocket endpoint should handle these error conditions gracefully:

  • Connection timeout: If Telepath cannot connect within 5 seconds, the call will be rejected
  • Audio processing errors: If your endpoint fails to respond, the call will experience silence
  • WebSocket disconnect: Telepath will attempt to reconnect up to 3 times before ending the call

Warning: Your WebSocket endpoint must use wss:// (TLS). Plain ws:// connections are not supported in production to ensure audio confidentiality.

Performance Optimization

  • Deploy your WebSocket server in the same region as your Telepath connection to minimize RTT
  • Use streaming speech-to-text and text-to-speech to begin responding before full utterance detection
  • Process audio in chunks of 20–40ms for optimal balance between latency and processing overhead
  • Use connection keep-alive to avoid cold-start penalties on new calls

Comparison

Feature OpenAI ElevenLabs Custom
Setup Difficulty Easy Easy Complex
Cost Medium Medium–High Variable
Customization Moderate High Full control
Response Time <200ms 200–500ms Varies
Reliability High High Depends on impl.
Language Support 100+ languages 30+ languages Your choice

Switching Providers

You can switch an existing connection to a different provider without interrupting your SIP trunk configuration. Follow these steps to migrate safely:

  1. Create a new Telepath connection with the target provider configured

  2. Test the new connection thoroughly using a test phone number

  3. Update your carrier's origination URI to point to the new connection's SIP username

  4. Monitor call quality for 24–48 hours after the switch

  5. Once satisfied, deactivate and delete the old connection

Info: You can run multiple connections in parallel using different SIP usernames. This makes it safe to A/B test providers without any downtime or risk to production traffic.

Troubleshooting

Connection Fails

  • Verify your API key is valid and not expired
  • Confirm your account has access to the required model or plan tier
  • For custom WebSocket: check that your endpoint is publicly accessible and accepts wss:// connections

Poor Audio Quality

  • Switch from G.711 to G.722 if your provider supports wideband audio
  • Check network latency between Telepath and your AI provider's servers
  • Ensure your carrier is sending clean audio without excessive packet loss

High Latency

  • Review per-component latency in the Telepath dashboard's call detail view
  • If AI Latency is elevated, consider switching providers or upgrading your plan tier
  • For custom WebSocket: profile your audio processing pipeline for bottlenecks

For comprehensive troubleshooting steps, see the Troubleshooting guide.