xAI's Grok Realtime API provides speech-to-speech conversation with <700ms latency. Unlike traditional voice AI (STT → LLM → TTS), Grok processes audio directly in a single model.Setup#
1. Add xAI API Key#
Navigate to Integrations → API Keys tab and add your xAI API key:When creating or editing an assistant with xAI configured:Provider: Select "xAI Realtime"
Voice: ara (or other available voices)
Note: When using xAI Realtime, separate STT/TTS providers are ignored.Pricing#
xAI Realtime uses BYOK pricing:€0,07/minute when using your xAI API key
Direct billing to your xAI account
No markup on xAI API usage
Differences from Traditional Mode#
| Feature | Traditional (STT+LLM+TTS) | xAI Realtime |
|---|
| Latency | ~1-2 seconds | <700ms |
| Providers | 3 separate | Single (xAI) |
| Voice Quality | Depends on TTS provider | Native to model |
| Custom Tools | Supported via llm_config.tools | Check xAI docs for support |
| BYOK Keys Required | 3 keys (STT, LLM, TTS) | 1 key (xAI) |
Limitations#
Custom system prompts may work differently than OpenAI
Tool calling support depends on xAI API capabilities
Voice selection limited to xAI's available voices
API Reference#
Modified at 2026-03-17 10:59:50