Rate Limits
API rate limits, response headers, and best practices for staying within limits.
HMS Sovereign applies rate limits to ensure fair usage and maintain service quality for all users.
Current Limits
| Limit Type | Rate | Scope |
|---|---|---|
| API Requests | 100 requests/minute | Per API key |
| Call Control | 10 commands/minute | Per active call |
Rate Limit Headers
Every API response includes headers to help you track your usage:
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests allowed |
X-RateLimit-Remaining | Requests remaining in current window |
X-RateLimit-Reset | Unix timestamp when the limit resets |
Exceeding the Limit
When you exceed the rate limit, the API returns:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1702479600
Retry-After: 45{
"error": {
"code": "rate_limit_exceeded",
"message": "Too many requests. Please retry after 45 seconds."
}
}Best Practices
1. Monitor Rate Limit Headers
Check headers before making requests:
const response = await fetch(url, options);
const remaining = response.headers.get('X-RateLimit-Remaining');
if (remaining < 10) {
console.warn('Approaching rate limit:', remaining, 'requests remaining');
}2. Implement Exponential Backoff
When you receive a 429 response:
async function fetchWithRetry(url, options, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.status !== 429) {
return response;
}
const retryAfter = response.headers.get('Retry-After') || Math.pow(2, attempt);
await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
}
throw new Error('Max retries exceeded');
}3. Cache Responses
Cache data that doesn't change frequently:
const cache = new Map();
const CACHE_TTL = 60000; // 1 minute
async function getAgent(agentId) {
const cacheKey = `assistant:${agentId}`;
const cached = cache.get(cacheKey);
if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
return cached.data;
}
const response = await fetch(`/assistants/${agentId}`);
const data = await response.json();
cache.set(cacheKey, { data, timestamp: Date.now() });
return data;
}4. Batch Operations
Instead of multiple individual requests, use batch-friendly patterns:
// Instead of this:
for (const id of agentIds) {
const assistant = await getAgent(id); // N requests
}
// Do this:
const assistants = await listAgents(); // 1 request
const relevantAgents = assistants.filter(a => agentIds.includes(a.id));5. Use Webhooks for Real-Time Data
Instead of polling for call status, use webhooks:
{
"webhook_url": "https://your-domain.com/webhook",
"webhook_events": ["status-update", "end-of-call-report"]
}Call Control Limits
Call control commands have a separate limit of 10 commands per minute per active call. This prevents abuse while allowing normal interaction patterns.
Examples that count toward the limit:
inject-contextsayend-calltransfer
Higher Limits
If you need higher rate limits for your use case, contact support@hmsovereign.com with:
- Your organization ID
- Expected request volume
- Use case description