Rate Limits

API rate limits, response headers, and best practices for staying within limits.

HMS Sovereign applies rate limits to ensure fair usage and maintain service quality for all users.

Current Limits

Limit Type	Rate	Scope
API Requests	100 requests/minute	Per API key
Call Control	10 commands/minute	Per active call

Rate Limit Headers

Every API response includes headers to help you track your usage:

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed
`X-RateLimit-Remaining`	Requests remaining in current window
`X-RateLimit-Reset`	Unix timestamp when the limit resets

Exceeding the Limit

When you exceed the rate limit, the API returns:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1702479600
Retry-After: 45

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Too many requests. Please retry after 45 seconds."
  }
}

Best Practices

1. Monitor Rate Limit Headers

Check headers before making requests:

const response = await fetch(url, options);
const remaining = response.headers.get('X-RateLimit-Remaining');

if (remaining < 10) {
  console.warn('Approaching rate limit:', remaining, 'requests remaining');
}

2. Implement Exponential Backoff

When you receive a 429 response:

async function fetchWithRetry(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);
    
    if (response.status !== 429) {
      return response;
    }
    
    const retryAfter = response.headers.get('Retry-After') || Math.pow(2, attempt);
    await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
  }
  
  throw new Error('Max retries exceeded');
}

3. Cache Responses

Cache data that doesn't change frequently:

const cache = new Map();
const CACHE_TTL = 60000; // 1 minute

async function getAgent(agentId) {
  const cacheKey = `assistant:${agentId}`;
  const cached = cache.get(cacheKey);
  
  if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
    return cached.data;
  }
  
  const response = await fetch(`/assistants/${agentId}`);
  const data = await response.json();
  
  cache.set(cacheKey, { data, timestamp: Date.now() });
  return data;
}

4. Batch Operations

Instead of multiple individual requests, use batch-friendly patterns:

// Instead of this:
for (const id of agentIds) {
  const assistant = await getAgent(id);  // N requests
}

// Do this:
const assistants = await listAgents();  // 1 request
const relevantAgents = assistants.filter(a => agentIds.includes(a.id));

5. Use Webhooks for Real-Time Data

Instead of polling for call status, use webhooks:

{
  "webhook_url": "https://your-domain.com/webhook",
  "webhook_events": ["status-update", "end-of-call-report"]
}

Call Control Limits

Call control commands have a separate limit of 10 commands per minute per active call. This prevents abuse while allowing normal interaction patterns.

Examples that count toward the limit:

inject-context
say
end-call
transfer

Higher Limits

If you need higher rate limits for your use case, contact support@hmsovereign.com with:

Your organization ID
Expected request volume
Use case description

On this page