VoiceDock Docs
Integrations

xAI Grok Integration

Use xAI Grok Realtime API for speech-to-speech conversation with sub-700ms latency.

xAI's Grok Realtime API provides speech-to-speech conversation with <700ms latency. Unlike traditional voice AI (STT → LLM → TTS), Grok processes audio directly in a single model.

Setup

1. Add xAI API Key

Navigate to IntegrationsAPI Keys tab and add your xAI API key:

curl -X POST https://api.hmsovereign.com/api/v1/byok 
  -H "Authorization: Bearer YOUR_API_KEY" 
  -H "Content-Type: application/json" 
  -d '{
    "provider": "xai",
    "api_key": "xai-..."
  }'

2. Configure Assistant

When creating or editing an assistant with xAI configured:

  • Provider: Select "xAI Realtime"
  • Model: grok-realtime-v1
  • Voice: ara (or other available voices)

Note: When using xAI Realtime, separate STT/TTS providers are ignored.

Pricing

xAI Realtime uses BYOK pricing:

  • €0,07/minute when using your xAI API key
  • Direct billing to your xAI account
  • No markup on xAI API usage

Differences from Traditional Mode

FeatureTraditional (STT+LLM+TTS)xAI Realtime
Latency~1-2 seconds<700ms
Providers3 separateSingle (xAI)
Voice QualityDepends on TTS providerNative to model
Custom ToolsSupported via llm_config.toolsCheck xAI docs for support
BYOK Keys Required3 keys (STT, LLM, TTS)1 key (xAI)

Limitations

  • Custom system prompts may work differently than OpenAI
  • Tool calling support depends on xAI API capabilities
  • Voice selection limited to xAI's available voices

API Reference

See BYOK API Reference for managing xAI API keys.

On this page