VoiceDock Docs
Features

Web Calls

Let users speak directly to an AI assistant from their browser via WebRTC, no phone number required.

Web calls let your users speak directly to an AI assistant from their browser — no phone number required. The browser connects via WebRTC and the assistant runs exactly like it does for phone calls.

Note: We're working on official SDKs for React, Vue, and vanilla JavaScript. In the meantime, this guide provides everything you need to build a working integration using the open-source livekit-client and @livekit/components-react packages.

How It Works

User clicks "Talk to AI"

Your backend → POST https://assistant-api.hmsovereign.com/v1/web-calls (Bearer <org_api_key>)

HMS validates key, creates voice room, dispatches assistant → returns { token, server_url }

Your backend passes token + server_url to the browser

Browser connects via WebRTC (microphone audio) using the token

AI assistant picks up → full STT → LLM → TTS pipeline

Call ends → summary, transcript, credits deducted, webhook fired

The call appears in your Calls dashboard with direction: "web" and is billed at the same per-minute rate as phone calls.

WarningSecurity: Never expose your API key to the browser. Always proxy web call requests through your own backend server. The browser only ever receives the short-lived token.


API Reference

Create Web Call

POST https://assistant-api.hmsovereign.com/v1/web-calls

Authentication

HeaderValue
AuthorizationBearer YOUR_API_KEY
Content-Typeapplication/json

Your organization API key is found in the HMS Sovereign dashboard under Settings > API Keys. The org_id is automatically derived from the key — you do not need to pass it.

Request Body

FieldTypeRequiredDescription
assistant_idstring (uuid)No*Saved assistant to use. Must belong to your organization.
assistant_overrideobjectNoPartial field overrides applied on top of assistant_id (hybrid mode). Requires assistant_id.
assistantobjectNo*Full inline assistant config (transient mode). Cannot be combined with assistant_override.

*At least one of assistant_id or assistant is required. See configuration modes below.

Reference mode — use a saved assistant as-is:

{
  "assistant_id": "17a0cb75-fa09-4bdd-9a44-92a70d829c88"
}

Hybrid mode — saved assistant with partial overrides:

{
  "assistant_id": "17a0cb75-fa09-4bdd-9a44-92a70d829c88",
  "assistant_override": {
    "first_message": "Custom greeting!",
    "llm_config": {
      "messages": [{ "role": "system", "content": "You are a sales assistant." }]
    }
  }
}

Transient mode — full inline config, no saved assistant required:

{
  "assistant": {
    "stt_config": { "provider": "deepgram", "model": "nova-3", "language": "en" },
    "llm_config": {
      "provider": "openai",
      "model": "gpt-4.1-mini",
      "messages": [{ "role": "system", "content": "You are a helpful assistant." }]
    },
    "tts_config": { "provider": "elevenlabs", "voice_id": "your-voice-id" },
    "first_message": "Hello! How can I help you?"
  }
}

Available fields in assistant / assistant_override: stt_config, llm_config, tts_config, first_message, business_name, name, analysis_plan, autonomous_silence_handling, gdpr_mode, webhook_url, webhook_secret, webhook_events, metadata.

Transient mode requires stt_config, llm_config, and tts_config inside assistant.


Passing Custom Metadata

Use the metadata field inside assistant_override (or assistant in transient mode) to attach arbitrary key-value data to a web call. HMS Sovereign passes it through unchanged to all webhook payloads under message.assistant.metadata.

This is the correct way to correlate a web call with your own users, sessions, or records — for example, passing a user_id so your webhook handler knows which user the call belongs to.

Note: assistant-request does not fire for web calls. Unlike inbound phone calls — where you can inject metadata dynamically in the assistant-request response — web calls have no pre-call webhook. All metadata must be passed at session creation time.

{
  "assistant_id": "17a0cb75-fa09-4bdd-9a44-92a70d829c88",
  "assistant_override": {
    "metadata": {
      "user_id": "usr_8f3a2b1c",
      "session_id": "ses_9d4e3c2b",
      "plan": "pro"
    }
  }
}

The metadata object appears as-is in every webhook fired for that call:

{
  "message": {
    "type": "end-of-call-report",
    "call": { "id": "3f2a1b4c-...", "type": "web_call" },
    "assistant": {
      "metadata": {
        "user_id": "usr_8f3a2b1c",
        "session_id": "ses_9d4e3c2b",
        "plan": "pro"
      }
    }
  }
}

WarningKey casing is preserved: Metadata keys are passed through exactly as you send them — HMS Sovereign does not convert them to snake_case or any other format. If your webhook handler expects user_id, send user_id. If it expects UserID, send UserID.

Configuration Modes

ModeWhen to use
ReferenceUse a saved assistant exactly as configured in the dashboard
HybridUse a saved assistant but override specific fields per-call (e.g. dynamic first message, custom system prompt)
TransientFully define the assistant inline — useful for dynamic or ephemeral assistants not saved in the dashboard

Success Response — 200 OK

{
  "success": true,
  "call_id": "3f2a1b4c-5d6e-7f8a-9b0c-1d2e3f4a5b6c",
  "room_name": "web-3f2a1b4c",
  "token": "<jwt>",
  "server_url": "wss://rtc.hmsovereign.com"
}
FieldTypeDescription
successbooleanAlways true on 200
call_idstring (uuid)Unique call ID — appears in your calls dashboard
room_namestringVoice room name
tokenstringShort-lived JWT (5-minute TTL) — pass this to the browser to connect
server_urlstringWebSocket URL for the voice server (always wss://)

Important: the token expires after 5 minutes whether or not the user joins. Once the call starts, it runs until the user hangs up or the 5-minute max duration is hit.

Error Responses

StatusDetailMeaning
400Must provide assistant_id, assistant, or bothNo configuration provided
400assistant_override requires assistant_idOverride provided without a base assistant
400Cannot provide both assistant and assistant_overrideAmbiguous config mode
400Transient assistant must provide: stt_config, llm_config, tts_configTransient mode missing required configs
401Invalid API keyAPI key not recognized
402Insufficient credits to start web callOrganization has no balance
403Assistant not found or does not belong to this organizationInvalid assistant_id for this org
429Maximum 3 concurrent web calls per organizationToo many active calls — user must wait
500Failed to create web call session: ...Server error

Limits

LimitValue
Max concurrent web calls per organization3
Max call duration5 minutes
Token TTL (time to join)5 minutes
Room auto-deleted if nobody joins60 seconds

Integration Guide

Set Up Your Backend

Create an endpoint that proxies requests to the HMS Sovereign API. This keeps your API key secure on the server — the browser only ever receives the short-lived token.

Node.js / Express

const express = require("express");
const app = express();
app.use(express.json());

const HMS_API_KEY  = process.env.HMS_API_KEY;        // org API key — never expose to browser
const HMS_ASSISTANT_ID = process.env.HMS_ASSISTANT_ID;  // your assistant UUID

app.post("/api/start-call", async (req, res) => {
  try {
    // Reference mode (most common):
    const body = { assistant_id: HMS_ASSISTANT_ID };

    // Hybrid mode — override specific fields per-call:
    // const body = {
    //   assistant_id: HMS_ASSISTANT_ID,
    //   assistant_override: { first_message: "Welcome! How can I help?", llm_config: { ... } }
    // };

    // Transient mode — full inline config, no saved assistant:
    // const body = {
    //   assistant: {
    //     stt_config: { provider: "deepgram", model: "nova-3", language: "en" },
    //     llm_config: { provider: "openai", model: "gpt-4.1-mini", messages: [...] },
    //     tts_config: { provider: "elevenlabs", voice_id: "..." },
    //     first_message: "Hello!"
    //   }
    // };

    const response = await fetch("https://assistant-api.hmsovereign.com/v1/web-calls", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${HMS_API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(body),
    });

    const data = await response.json();

    if (!response.ok) {
      return res.status(response.status).json({ error: data.detail });
    }

    // Only pass token and server_url to the browser — never the API key
    res.json({ token: data.token, server_url: data.server_url });
  } catch (err) {
    res.status(500).json({ error: "Failed to start call" });
  }
});

Python / FastAPI

import os, httpx
from fastapi import FastAPI, HTTPException

app = FastAPI()

HMS_API_KEY  = os.environ["HMS_API_KEY"]        # org API key — never expose to browser
HMS_ASSISTANT_ID = os.environ["HMS_ASSISTANT_ID"]  # your assistant UUID

@app.post("/api/start-call")
async def start_call():
    # Reference mode (most common):
    body = {"assistant_id": HMS_ASSISTANT_ID}

    # Hybrid mode — override specific fields per-call:
    # body = {"assistant_id": HMS_ASSISTANT_ID, "assistant_override": {"first_message": "Welcome!"}}

    # Transient mode — full inline config:
    # body = {"assistant": {"stt_config": {...}, "llm_config": {...}, "tts_config": {...}}}

    async with httpx.AsyncClient() as client:
        r = await client.post(
            "https://assistant-api.hmsovereign.com/v1/web-calls",
            headers={"Authorization": f"Bearer {HMS_API_KEY}"},
            json=body,
        )

    if r.status_code != 200:
        raise HTTPException(
            status_code=r.status_code,
            detail=r.json().get("detail")
        )

    data = r.json()
    # Only pass token and server_url to the browser — never the API key
    return {"token": data["token"], "server_url": data["server_url"]}

Next.js API Route

// app/api/start-call/route.ts
import { NextResponse } from "next/server";

const HMS_API_KEY  = process.env.HMS_API_KEY!;        // org API key — never expose to browser
const HMS_ASSISTANT_ID = process.env.HMS_ASSISTANT_ID!;  // your assistant UUID

export async function POST() {
  // Reference mode (most common):
  const body = { assistant_id: HMS_ASSISTANT_ID };

  // Hybrid mode — override specific fields per-call:
  // const body = {
  //   assistant_id: HMS_ASSISTANT_ID,
  //   assistant_override: { first_message: "Welcome!", llm_config: { ... } }
  // };

  // Transient mode — full inline config:
  // const body = {
  //   assistant: {
  //     stt_config: { provider: "deepgram", model: "nova-3", language: "en" },
  //     llm_config: { provider: "openai", model: "gpt-4.1-mini", messages: [...] },
  //     tts_config: { provider: "elevenlabs", voice_id: "..." },
  //     first_message: "Hello!"
  //   }
  // };

  const response = await fetch("https://assistant-api.hmsovereign.com/v1/web-calls", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${HMS_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify(body),
  });

  const data = await response.json();

  if (!response.ok) {
    return NextResponse.json({ error: data.detail }, { status: response.status });
  }

  // Only pass token and server_url to the browser — never the API key
  return NextResponse.json({ token: data.token, server_url: data.server_url });
}

Install the Client SDK

Install livekit-client in your frontend project to connect to the voice session:

npm

npm install livekit-client

yarn

yarn add livekit-client

pnpm

pnpm add livekit-client

For React projects, also install the React components library:

npm install @livekit/components-react @livekit/components-styles

Connect to the Voice Session

Fetch the token from your backend and connect to the voice room.

Vanilla JavaScript

import { Room, RoomEvent } from "livekit-client";

async function startCall() {
  // 1. Get token from your backend
  const res = await fetch("/api/start-call", { method: "POST" });
  const { token, server_url } = await res.json();

  // 2. Connect to the voice room
  const room = new Room();
  await room.connect(server_url, token);

  // 3. Enable microphone
  await room.localParticipant.setMicrophoneEnabled(true);

  // 4. Play assistant audio
  room.on(RoomEvent.TrackSubscribed, (track) => {
    if (track.kind === "audio") {
      const element = track.attach();
      document.body.appendChild(element);
    }
  });

  // 5. Handle disconnect
  room.on(RoomEvent.Disconnected, () => {
    console.log("Call ended");
  });

  return room;
}

function endCall(room) {
  room.disconnect();
}

React

"use client";

import { useState, useCallback } from "react";
import {
  LiveKitRoom,
  useVoiceAssistant,
  RoomAudioRenderer,
} from "@livekit/components-react";
import "@livekit/components-styles";

interface CallSession {
  token: string;
  server_url: string;
}

export function VoiceCallButton({ agentId }: { agentId: string }) {
  const [session, setSession] = useState<CallSession | null>(null);
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState<string | null>(null);

  const startCall = useCallback(async () => {
    setLoading(true);
    setError(null);

    try {
      const res = await fetch("/api/start-call", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ assistant_id: agentId }),
      });
      const data = await res.json();

      if (!res.ok) {
        const msg =
          res.status === 429 ? "Too many active calls — please try again shortly."
          : res.status === 402 ? "Insufficient credits."
          : data.detail ?? "Failed to start call.";
        setError(msg);
        return;
      }

      setSession(data);
    } catch {
      setError("Network error — please try again.");
    } finally {
      setLoading(false);
    }
  }, [agentId]);

  const endCall = useCallback(() => setSession(null), []);

  if (session) {
    return (
      <LiveKitRoom
        token={session.token}
        serverUrl={session.server_url}
        connect={true}
        audio={true}
        video={false}
        onDisconnected={endCall}
      >
        <CallInterface onHangUp={endCall} />
        <RoomAudioRenderer />
      </LiveKitRoom>
    );
  }

  return (
    <div>
      {error && <p style={{ color: "red" }}>{error}</p>}
      <button onClick={startCall} disabled={loading}>
        {loading ? "Connecting..." : "Talk to AI"}
      </button>
    </div>
  );
}

function CallInterface({ onHangUp }: { onHangUp: () => void }) {
  const { state } = useVoiceAssistant();

  return (
    <div>
      <p>
        {state === "connecting" && "Connecting..."}
        {state === "listening" && "Assistant is listening"}
        {state === "thinking" && "Assistant is thinking..."}
        {state === "speaking" && "Assistant is speaking"}
      </p>
      <button onClick={onHangUp}>Hang Up</button>
    </div>
  );
}

Live Transcription

The assistant automatically publishes real-time transcriptions for both user speech (STT output) and assistant speech (synchronized with TTS playback) via the lk.transcription text stream topic. This is enabled by default — no configuration needed.

How It Works

SourceDescription
User speechThe assistant runs STT and publishes the recognized text. Interim results arrive first (lk.transcription_final: "false"), followed by the final result ("true").
Assistant speechThe assistant's text is synchronized word-by-word with audio playback. If the assistant is interrupted, the transcription is truncated to match what was actually spoken.

Each speech segment has a unique lk.segment_id. Interim and final results share the same ID, so you can replace interim entries with the final version.

Vanilla JavaScript

room.registerTextStreamHandler("lk.transcription", async (reader, participantInfo) => {
  const text = await reader.readAll();
  const attrs = reader.info.attributes;
  const isFinal = attrs["lk.transcription_final"] === "true";
  const segmentId = attrs["lk.segment_id"] ?? reader.info.id;

  // participantInfo.identity is the actual speaker —
  // user speech is published with the user's identity, assistant speech with the assistant's identity
  const isUser = participantInfo.identity === room.localParticipant.identity;
  const role = isUser ? "user" : "assistant";

  console.log(`[${role}] ${text}`, { isFinal, segmentId });
});

React Example

Use useRoomContext from @livekit/components-react inside a <LiveKitRoom> to access the room:

"use client";

import { useEffect, useRef, useState } from "react";
import { useRoomContext } from "@livekit/components-react";

interface TranscriptEntry {
  id: string;
  role: "user" | "assistant";
  text: string;
  isFinal: boolean;
}

export function LiveTranscript() {
  const room = useRoomContext();
  const [entries, setEntries] = useState<TranscriptEntry[]>([]);
  const scrollRef = useRef<HTMLDivElement>(null);

  useEffect(() => {
    const unregister = room.registerTextStreamHandler(
      "lk.transcription",
      async (reader, participantInfo) => {
        const text = await reader.readAll();
        const attrs = reader.info.attributes;
        const isFinal = attrs["lk.transcription_final"] === "true";
        const segmentId = attrs["lk.segment_id"] ?? reader.info.id;

        const isUser = participantInfo.identity === room.localParticipant.identity;
        const role = isUser ? "user" : "assistant";

        setEntries((prev) => {
          const existing = prev.findIndex((e) => e.id === segmentId);
          const entry: TranscriptEntry = { id: segmentId, role, text, isFinal };
          if (existing >= 0) {
            const updated = [...prev];
            updated[existing] = entry;
            return updated;
          }
          return [...prev, entry];
        });
      }
    );

    return () => { unregister?.(); };
  }, [room]);

  useEffect(() => {
    scrollRef.current?.scrollTo(0, scrollRef.current.scrollHeight);
  }, [entries]);

  return (
    <div ref={scrollRef} style={{ maxHeight: 300, overflowY: "auto" }}>
      {entries.map((entry) => (
        <div key={entry.id} style={{ opacity: entry.isFinal ? 1 : 0.5 }}>
          <strong>{entry.role === "assistant" ? "Assistant" : "You"}:</strong> {entry.text}
        </div>
      ))}
    </div>
  );
}

Place <LiveTranscript /> inside your <LiveKitRoom> component so it has access to the room context:

<LiveKitRoom token={session.token} serverUrl={session.server_url} connect audio onDisconnected={endCall}>
  <CallInterface onHangUp={endCall} />
  <LiveTranscript />
  <RoomAudioRenderer />
</LiveKitRoom>

Note: Tool/function calls are not published over the transcription stream. They appear in the post-call transcript via webhooks only.


Handling Microphone Permissions

The browser will prompt for microphone access when connecting with audio={true}. If the user denies permission, a MediaDeviceFailure error is raised on the onError callback:

<LiveKitRoom
  token={session.token}
  serverUrl={session.server_url}
  connect={true}
  audio={true}
  video={false}
  onDisconnected={endCall}
  onError={(err) => {
    // MediaDeviceFailure is thrown when microphone access is denied
    setError("Microphone access was denied. Please allow microphone access and try again.");
    endCall();
  }}
>

Webhooks

Web calls fire the same webhook events as phone calls. The call.type field is "web_call" instead of "inbound_phone_call".

status-update — call started

{
  "message": {
    "type": "status-update",
    "status": "in-progress",
    "call": {
      "id": "3f2a1b4c-...",
      "type": "web_call",
      "status": "in-progress"
    }
  }
}

end-of-call-report

{
  "message": {
    "type": "end-of-call-report",
    "call": {
      "id": "3f2a1b4c-...",
      "type": "web_call",
      "status": "ended"
    },
    "end_reason": "user_hangup",
    "duration_seconds": 47,
    "summary": "The user asked about pricing...",
    "messages": [ "..." ],
    "assistant": {
      "metadata": { "..." }
    }
  }
}

End Reasons

ValueMeaning
user_hangupBrowser disconnected or user clicked hang up
agent_hangupAssistant called the end_call tool
max_duration5-minute hard limit reached
errorUnexpected assistant error
config_errorAssistant configuration is invalid

See Webhooks Overview for webhook setup and configuration.


Billing

Web calls are billed at the same per-minute rate as phone calls. Usage appears in your dashboard under the Calls section with direction: "web".


FAQ

Do I need to run my own voice infrastructure?

No. HMS Sovereign provides fully managed voice infrastructure. Your users connect to our servers using the short-lived token returned by the API.

Can I use this on mobile browsers?

Yes. Web calls work on any browser that supports WebRTC, including mobile Safari and Chrome on iOS/Android.

What happens if the user loses internet connection?

The call ends automatically. A disconnection is detected server-side and triggers the end-of-call-report webhook with end_reason: "error".

Can I customize the assistant per-call?

Yes. Use hybrid mode to override specific fields (e.g. first message, system prompt) on top of a saved assistant, or use transient mode to define the full assistant configuration inline — no saved assistant required. See the configuration modes section above.

Are official SDKs coming?

Yes — we're actively building first-party SDKs for React, Vue, and vanilla JavaScript. This guide will be updated when they're available.


Next Steps

On this page