Documentation

Rate Limits

V2

Kaiko enforces rate limits to ensure fair usage and platform stability. This page explains the V2 limits, how to monitor usage, and best practices.

V2 Rate Limits
LimitValueNotes
Requests per minute1,000 RPMPer API key
Tokens per month100,000Upgradeable (contact sales)
Concurrent requests10Per API key
Request timeout30 secondsFor non-streaming requests
Batch size (batch-analysis)100 messagesPer batch request
Rate Limit Headers

All API responses include headers to help you track your usage:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 950
X-RateLimit-Reset: 1735862400
Retry-After: 60
HeaderDescription
X-RateLimit-LimitYour rate limit per minute
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetUnix timestamp when limit resets
Retry-AfterSeconds to wait (only on 429)
Handling 429 Errors

When you exceed the rate limit, the API returns a 429 status code. Implement exponential backoff:

async function callWithRetry(fn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (error.status === 429 || error.status >= 500) {
        const delay = Math.pow(2, i) * 1000;
        await new Promise(r => setTimeout(r, delay));
        continue;
      }
      throw error;
    }
  }
  throw new Error('Max retries exceeded');
}
  • Start with a 1-second delay, doubling each retry.
  • Respect the Retry-After header if present.
  • Add jitter (random delay) to prevent thundering herd.
  • Monitor 429 rates to identify if you need higher limits.
Token Usage

V2 API responses include token usage in the response body:

FieldDescription
usage.prompt_tokensTokens in the prompt (Chat API)
usage.completion_tokensTokens in the LLM response (Chat API)
usage.analyse_tokensTokens used for emotion analysis
Best Practices
  • Batch when possible:Use /v2/emotions/batch-analysis for multiple texts instead of individual calls.
  • Cache results:For repeated queries, cache emotion analysis results.
  • Use stateless API for one-off:Context-based APIs have slightly higher overhead.
  • Monitor usage:Track token consumption via the usage object and dashboard.
  • Request limit increase:Contact sales@kaikostudios.xyz for enterprise quotas.
Enterprise Limits

For higher limits, contact our sales team:

  • Custom RPM and token quotas
  • Dedicated rate limit pools
  • Priority queue access
  • SLA guarantees

Email: sales@kaikostudios.xyz

Next: see Error Handling for all error codes and troubleshooting, or Authentication for security best practices.