Web Calls#
Web calls let your users speak directly to an AI assistant from their browser — no phone number required. The browser connects via WebRTC and the assistant runs exactly like it does for phone calls.We're working on official SDKs for React, Vue, and vanilla JavaScript. In the meantime, this guide provides everything you need to build a working integration using the open-source livekit-client and @livekit/components-react packages.
How It Works#
User clicks "Talk to AI"
↓
Your backend → POST https://assistant-api.hmsovereign.com/v1/web-calls (Bearer <org_api_key>)
↓
HMS validates key, creates voice room, dispatches assistant → returns { token, server_url }
↓
Your backend passes token + server_url to the browser
↓
Browser connects via WebRTC (microphone audio) using the token
↓
AI assistant picks up → full STT → LLM → TTS pipeline
↓
Call ends → summary, transcript, credits deducted, webhook fired
The call appears in your Calls dashboard with direction: "web" and is billed at the same per-minute rate as phone calls.Never expose your API key to the browser. Always proxy web call requests through your own backend server. The browser only ever receives the short-lived token.
API Reference#
Create Web Call#
POST https://assistant-api.hmsovereign.com/v1/web-calls
Authentication#
| Header | Value |
|---|
Authorization | Bearer YOUR_API_KEY |
Content-Type | application/json |
Your organization API key is found in the HMS Sovereign dashboard under Settings > API Keys. The org_id is automatically derived from the key — you do not need to pass it.Request Body#
| Field | Type | Required | Description |
|---|
assistant_id | string (uuid) | No* | Saved assistant to use. Must belong to your organization. |
assistant_override | object | No | Partial field overrides applied on top of assistant_id (hybrid mode). Requires assistant_id. |
assistant | object | No* | Full inline assistant config (transient mode). Cannot be combined with assistant_override. |
Reference mode — use a saved assistant as-is:{
"assistant_id": "17a0cb75-fa09-4bdd-9a44-92a70d829c88"
}
Hybrid mode — saved assistant with partial overrides:{
"assistant_id": "17a0cb75-fa09-4bdd-9a44-92a70d829c88",
"assistant_override": {
"first_message": "Custom greeting!",
"llm_config": {
"messages": [{ "role": "system", "content": "You are a sales assistant." }]
}
}
}
Transient mode — full inline config, no saved assistant required:{
"assistant": {
"stt_config": { "provider": "deepgram", "model": "nova-3", "language": "en" },
"llm_config": {
"provider": "openai",
"model": "gpt-4.1-mini",
"messages": [{ "role": "system", "content": "You are a helpful assistant." }]
},
"tts_config": { "provider": "elevenlabs", "voice_id": "your-voice-id" },
"first_message": "Hello! How can I help you?"
}
}
Available fields in assistant / assistant_override: stt_config, llm_config, tts_config, first_message, business_name, name, analysis_plan, autonomous_silence_handling, gdpr_mode, webhook_url, webhook_secret, webhook_events, metadata.Transient mode requires stt_config, llm_config, and tts_config inside assistant.
Use the metadata field inside assistant_override (or assistant in transient mode) to attach arbitrary key-value data to a web call. HMS Sovereign passes it through unchanged to all webhook payloads under message.assistant.metadata.This is the correct way to correlate a web call with your own users, sessions, or records — for example, passing a user_id so your webhook handler knows which user the call belongs to.assistant-request does not fire for web calls. Unlike inbound phone calls — where you can inject metadata dynamically in the assistant-request response — web calls have no pre-call webhook. All metadata must be passed at session creation time.
{
"assistant_id": "17a0cb75-fa09-4bdd-9a44-92a70d829c88",
"assistant_override": {
"metadata": {
"user_id": "usr_8f3a2b1c",
"session_id": "ses_9d4e3c2b",
"plan": "pro"
}
}
}
The metadata object appears as-is in every webhook fired for that call:{
"message": {
"type": "end-of-call-report",
"call": { "id": "3f2a1b4c-...", "type": "web_call" },
"assistant": {
"metadata": {
"user_id": "usr_8f3a2b1c",
"session_id": "ses_9d4e3c2b",
"plan": "pro"
}
}
}
}
Metadata keys are passed through exactly as you send them — HMS Sovereign does not convert them to snake_case or any other format. If your webhook handler expects user_id, send user_id. If it expects UserID, send UserID.
Configuration Modes {#configuration-modes}#
| Mode | When to use |
|---|
| Reference | Use a saved assistant exactly as configured in the dashboard |
| Hybrid | Use a saved assistant but override specific fields per-call (e.g. dynamic first message, custom system prompt) |
| Transient | Fully define the assistant inline — useful for dynamic or ephemeral assistants not saved in the dashboard |
Success Response — 200 OK#
{
"success": true,
"call_id": "3f2a1b4c-5d6e-7f8a-9b0c-1d2e3f4a5b6c",
"room_name": "web-3f2a1b4c",
"token": "<jwt>",
"server_url": "wss://rtc.hmsovereign.com"
}
| Field | Type | Description |
|---|
success | boolean | Always true on 200 |
call_id | string (uuid) | Unique call ID — appears in your calls dashboard |
room_name | string | Voice room name |
token | string | Short-lived JWT (5-minute TTL) — pass this to the browser to connect |
server_url | string | WebSocket URL for the voice server (always wss://) |
Important: the token expires after 5 minutes whether or not the user joins. Once the call starts, it runs until the user hangs up or the 5-minute max duration is hit.Error Responses#
| Status | Detail | Meaning |
|---|
400 | Must provide assistant_id, assistant, or both | No configuration provided |
400 | assistant_override requires assistant_id | Override provided without a base assistant |
400 | Cannot provide both assistant and assistant_override | Ambiguous config mode |
400 | Transient assistant must provide: stt_config, llm_config, tts_config | Transient mode missing required configs |
401 | Invalid API key | API key not recognized |
402 | Insufficient credits to start web call | Organization has no balance |
403 | Assistant not found or does not belong to this organization | Invalid assistant_id for this org |
429 | Maximum 3 concurrent web calls per organization | Too many active calls — user must wait |
500 | Failed to create web call session: ... | Server error |
Limits#
| Limit | Value |
|---|
| Max concurrent web calls per organization | 3 |
| Max call duration | 5 minutes |
| Token TTL (time to join) | 5 minutes |
| Room auto-deleted if nobody joins | 60 seconds |
Integration Guide#
1
Set Up Your Backend
Create an endpoint that proxies requests to the HMS Sovereign API. This keeps your API key secure on the server — the browser only ever receives the short-lived token.
2
Install the Client SDK
Install
livekit-client in your frontend project to connect to the voice session:
For React projects, also install the React components library:3
Connect to the Voice Session
Fetch the token from your backend and connect to the voice room.
Live Transcription#
The assistant automatically publishes real-time transcriptions for both user speech (STT output) and assistant speech (synchronized with TTS playback) via the lk.transcription text stream topic. This is enabled by default — no configuration needed.How It Works#
| Source | Description |
|---|
| User speech | The assistant runs STT and publishes the recognized text. Interim results arrive first (lk.transcription_final: "false"), followed by the final result ("true"). |
| Assistant speech | The assistant's text is synchronized word-by-word with audio playback. If the assistant is interrupted, the transcription is truncated to match what was actually spoken. |
Each speech segment has a unique lk.segment_id. Interim and final results share the same ID, so you can replace interim entries with the final version.Vanilla JavaScript#
React Example#
Use useRoomContext from @livekit/components-react inside a <LiveKitRoom> to access the room:"use client";
import { useEffect, useRef, useState } from "react";
import { useRoomContext } from "@livekit/components-react";
interface TranscriptEntry {
id: string;
role: "user" | "assistant";
text: string;
isFinal: boolean;
}
export function LiveTranscript() {
const room = useRoomContext();
const [entries, setEntries] = useState<TranscriptEntry[]>([]);
const scrollRef = useRef<HTMLDivElement>(null);
useEffect(() => {
const unregister = room.registerTextStreamHandler(
"lk.transcription",
async (reader, participantInfo) => {
const text = await reader.readAll();
const attrs = reader.info.attributes;
const isFinal = attrs["lk.transcription_final"] === "true";
const segmentId = attrs["lk.segment_id"] ?? reader.info.id;
const isUser = participantInfo.identity === room.localParticipant.identity;
const role = isUser ? "user" : "assistant";
setEntries((prev) => {
const existing = prev.findIndex((e) => e.id === segmentId);
const entry: TranscriptEntry = { id: segmentId, role, text, isFinal };
if (existing >= 0) {
const updated = [...prev];
updated[existing] = entry;
return updated;
}
return [...prev, entry];
});
}
);
return () => { unregister?.(); };
}, [room]);
useEffect(() => {
scrollRef.current?.scrollTo(0, scrollRef.current.scrollHeight);
}, [entries]);
return (
<div ref={scrollRef} style={{ maxHeight: 300, overflowY: "auto" }}>
{entries.map((entry) => (
<div key={entry.id} style={{ opacity: entry.isFinal ? 1 : 0.5 }}>
<strong>{entry.role === "assistant" ? "Assistant" : "You"}:</strong> {entry.text}
</div>
))}
</div>
);
}
Place <LiveTranscript /> inside your <LiveKitRoom> component so it has access to the room context:<LiveKitRoom token={session.token} serverUrl={session.server_url} connect audio onDisconnected={endCall}>
<CallInterface onHangUp={endCall} />
<LiveTranscript />
<RoomAudioRenderer />
</LiveKitRoom>
Tool/function calls are not published over the transcription stream. They appear in the post-call transcript via webhooks only.
Handling Microphone Permissions#
The browser will prompt for microphone access when connecting with audio={true}. If the user denies permission, a MediaDeviceFailure error is raised on the onError callback:<LiveKitRoom
token={session.token}
serverUrl={session.server_url}
connect={true}
audio={true}
video={false}
onDisconnected={endCall}
onError={(err) => {
// MediaDeviceFailure is thrown when microphone access is denied
setError("Microphone access was denied. Please allow microphone access and try again.");
endCall();
}}
>
Webhooks#
Web calls fire the same webhook events as phone calls. The call.type field is "web_call" instead of "inbound_phone_call".status-update — call started#
{
"message": {
"type": "status-update",
"status": "in-progress",
"call": {
"id": "3f2a1b4c-...",
"type": "web_call",
"status": "in-progress"
}
}
}
end-of-call-report#
{
"message": {
"type": "end-of-call-report",
"call": {
"id": "3f2a1b4c-...",
"type": "web_call",
"status": "ended"
},
"endedReason": "user_hangup",
"durationSeconds": 47,
"summary": "The user asked about pricing...",
"transcript": [ "..." ]
}
}
End Reasons#
| Value | Meaning |
|---|
user_hangup | Browser disconnected or user clicked hang up |
agent_hangup | Assistant called the end_call tool |
max_duration | 5-minute hard limit reached |
error | Unexpected assistant error |
config_error | Assistant configuration is invalid |
Billing#
Web calls are billed at the same per-minute rate as phone calls. Usage appears in your dashboard under the Calls section with direction: "web".
FAQ#
Next Steps#