Kaiko enforces rate limits to ensure fair usage and platform stability. This page explains the V2 limits, how to monitor usage, and best practices.
| Limit | Value | Notes |
|---|---|---|
| Requests per minute | 1,000 RPM | Per API key |
| Tokens per month | 100,000 | Upgradeable (contact sales) |
| Concurrent requests | 10 | Per API key |
| Request timeout | 30 seconds | For non-streaming requests |
| Batch size (batch-analysis) | 100 messages | Per batch request |
All API responses include headers to help you track your usage:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 950
X-RateLimit-Reset: 1735862400
Retry-After: 60| Header | Description |
|---|---|
| X-RateLimit-Limit | Your rate limit per minute |
| X-RateLimit-Remaining | Requests remaining in current window |
| X-RateLimit-Reset | Unix timestamp when limit resets |
| Retry-After | Seconds to wait (only on 429) |
When you exceed the rate limit, the API returns a 429 status code. Implement exponential backoff:
async function callWithRetry(fn, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (error) {
if (error.status === 429 || error.status >= 500) {
const delay = Math.pow(2, i) * 1000;
await new Promise(r => setTimeout(r, delay));
continue;
}
throw error;
}
}
throw new Error('Max retries exceeded');
}Retry-After header if present.V2 API responses include token usage in the response body:
| Field | Description |
|---|---|
| usage.prompt_tokens | Tokens in the prompt (Chat API) |
| usage.completion_tokens | Tokens in the LLM response (Chat API) |
| usage.analyse_tokens | Tokens used for emotion analysis |
For higher limits, contact our sales team:
Email: sales@kaikostudios.xyz
Next: see Error Handling for all error codes and troubleshooting, or Authentication for security best practices.