All prices in USD. These are the API costs charged by providers - not HMS Sovereign pricing to customers.
Speech-to-Text (STT)#
Deepgram#
| Model | Price per Minute |
|---|
| Nova 3 (Multilingual) | $0.0092 |
| Nova 3 (Monolingual) | $0.0077 |
| Nova 2 | $0.0058 |
| Nova 1 | $0.0058 |
| Enhanced | $0.0165 |
| Base | $0.0145 |
Note: Prices are Pay-As-You-Go tier. Growth tier is ~17% cheaper.Gladia#
| Model | Price per Hour |
|---|
| Solaria (Async) | $0.61 |
| Solaria (Real-time) | $0.75 |
Converted to per minute: ~0.0102/min(async), 0.0125/min (real-time)
Language Models (LLM)#
OpenAI#
| Model | Input | Output |
|---|
| GPT-5 Mini | $0.25 | $2.00 |
| GPT-4.1 | $2.00 | $8.00 |
| GPT-4.1 Mini | $0.40 | $1.60 |
| GPT-4.1 Nano | $0.10 | $0.40 |
| GPT-4o | $2.50 | $10.00 |
| GPT-4o (2024-05-13) | $5.00 | $15.00 |
| GPT-4o Mini | $0.15 | $0.60 |
| GPT-4 Turbo | $10.00 | $30.00 |
| GPT-4 | $30.00 | $60.00 |
| GPT-4 32K | $60.00 | $120.00 |
| GPT-3.5 Turbo | $0.50 | $1.50 |
| GPT-3.5 Turbo 16K | $3.00 | $4.00 |
Recommended for voice assistants: GPT-5 Mini (best value), GPT-4o Mini (fastest), GPT-4.1 Mini (balanced)Mistral#
| Model | Input | Output |
|---|
| Mistral Large | $0.50 | $1.50 |
| Mistral Medium | $0.40 | $2.00 |
| Mistral Small | $0.10 | $0.30 |
| Ministral 8B | $0.15 | $0.15 |
| Ministral 3B | $0.10 | $0.10 |
| Codestral | $0.30 | $0.90 |
| Mixtral 8x7B | $0.70 | $0.70 |
| Mixtral 8x22B | $2.00 | $6.00 |
Recommended for voice assistants: Mistral Small (fast + cheap), Mistral Medium (balanced)xAI (Grok)#
| Model | Input | Output |
|---|
| Grok 4.1 Fast | $0.20 | $0.50 |
| Grok 4 Fast | $0.20 | $0.50 |
| Grok Code Fast 1 | $0.20 | $1.50 |
| Grok 4 (0709) | $3.00 | $15.00 |
| Grok 3 Mini | $0.30 | $0.50 |
| Grok 3 | $3.00 | $15.00 |
Realtime API (Speech-to-Speech):| Model | Price |
|---|
| Grok Realtime v1 | 0.05/min(3.00/hr) |
Recommended: Grok 4.1 Fast (best value), Grok Realtime (for S2S)
Text-to-Speech (TTS)#
ElevenLabs#
Prices per 1,000 characters. Based on Creator tier ($22/mo).| Model | Price per 1K chars |
|---|
| Flash v2.5 | $0.11 |
| Turbo v2.5 | $0.11 |
| Eleven v3 | $0.22 |
| Multilingual v2 | $0.22 |
| Monolingual v1 | $0.22 |
| Tier | Flash/Turbo per 1K | Multilingual per 1K |
|---|
| Free | N/A | $0.17 |
| Starter ($5) | $0.08 | $0.17 |
| Creator ($22) | $0.11 | $0.22 |
| Pro ($99) | $0.10 | $0.20 |
| Scale ($330) | $0.08 | $0.17 |
| Business ($1,320) | $0.06 | $0.12 |
Recommended: Flash v2.5 (fastest, cheapest), Multilingual v2 (best quality)Inworld#
Prices per 1,000,000 characters (On-demand tier).| Model | Price per 1M chars | Per 1K chars |
|---|
| TTS 1.5 Mini | $5.00 | $0.005 |
| TTS 1.5 Max | $10.00 | $0.01 |
| TTS 1 | $5.00 | $0.005 |
| TTS 1 Max | $10.00 | $0.01 |
Note: Inworld is ~20x cheaper than ElevenLabs! At 650 chars/min:Inworld 1.5-Mini: $0.00325/min
Inworld 1.5-Max: $0.0065/min
ElevenLabs Flash: $0.0715/min
Cost Estimation per Minute of Voice Conversation#
Typical conversation metrics (based on real call data):LLM: ~500 input tokens, ~200 output tokens per turn, ~10 turns = 5,000 input + 2,000 output
TTS: ~1,200 characters (measured from actual 62s call)
Example: Budget Setup (Deepgram Nova 3 + GPT-5 Mini + ElevenLabs Flash)#
| Component | Usage | Cost |
|---|
| STT | 1 min | $0.0077 |
| LLM Input | 5K tokens | $0.00125 |
| LLM Output | 2K tokens | $0.004 |
| TTS | 1.2K chars | $0.132 |
| Total | | ~$0.145/min |
Example: Quality Setup (Deepgram Nova 3 + GPT-4o + ElevenLabs Multilingual v2)#
| Component | Usage | Cost |
|---|
| STT | 1 min | $0.0077 |
| LLM Input | 5K tokens | $0.0125 |
| LLM Output | 2K tokens | $0.02 |
| TTS | 1.2K chars | $0.264 |
| Total | | ~$0.304/min |
Example: Grok Realtime (Speech-to-Speech)#
| Component | Usage | Cost |
|---|
| S2S | 1 min | $0.05 |
| Total | | $0.05/min |
Pricing Strategy Notes#
Current HMS Sovereign pricing:BYOK: €0.07/min (orchestration only)
Platform keys: €0.30/min (flat rate, includes provider costs)
Margin at €0.30/min with Budget Setup:Provider cost: $0.145 (€0.134)
Margin at €0.30/min with Quality Setup:Provider cost: $0.304 (€0.281)
Margin: ~6% (BARELY PROFITABLE!)
Margin at €0.30/min with Grok Realtime:Provider cost: $0.05 (~€0.046)
Warning: ElevenLabs is the dominant cost driver. With Multilingual v2, margins are razor thin at €0.30/min. Consider:1.
Higher pricing for premium voices
2.
Restricting platform keys to Flash models only
3.
Moving to Business tier ($0.06/1K) to cut TTS costs in half
Modified at 2026-03-17 10:59:50