Documentation

Rate Limits

Rate limits protect the API from abuse and ensure fair usage across all customers. Limits are applied per API key on a sliding window basis.

Rate Limits per Plan

Rate limits are applied per API key on a rolling 1-hour window. When you exceed your limit, the API returns a 429 Too Many Requests response until the window resets.

Plan	Requests / Hour	Requests / Minute	Concurrent Requests
Free	100	10	2
Starter	1,000	50	10
Pro	10,000	500	50
Enterprise	Unlimited	Custom	Custom

Enterprise customers receive custom rate limits based on their contract. Contact sales to discuss your needs.

Rate Limit Headers

Every API response includes headers that tell you your current rate limit status. Use these headers to implement proactive throttling in your application.

Header	Description
`X-RateLimit-Limit`	Maximum number of requests allowed in the current window.
`X-RateLimit-Remaining`	Number of requests remaining in the current window.
`X-RateLimit-Reset`	Unix timestamp (seconds) when the rate limit window resets.
`Retry-After`	Seconds to wait before retrying (only included in 429 responses).

Example Response Headers

HTTP Headers

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1707933600
Content-Type: application/json

Handling 429 Responses

When you exceed your rate limit, the API returns a 429 status code. Your application should handle this gracefully using exponential backoff with jitter. Here are implementations in different languages.

429 Response Body

JSON

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please retry after 32 seconds.",
    "retry_after": 32
  }
}

Python — Exponential Backoff

python

import time
import random
import requests

def call_api_with_retry(url, headers, payload, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=payload)

        if response.status_code == 429:
            retry_after = response.json().get("error", {}).get("retry_after", 2 ** attempt)
            jitter = random.uniform(0, 1)
            wait_time = retry_after + jitter
            print(f"Rate limited. Retrying in {wait_time:.1f}s (attempt {attempt + 1})")
            time.sleep(wait_time)
            continue

        response.raise_for_status()
        return response.json()

    raise Exception("Max retries exceeded")

JavaScript / TypeScript — Exponential Backoff

typescript

async function callApiWithRetry(
  url: string,
  headers: Record<string, string>,
  payload: object,
  maxRetries = 5
) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, {
      method: "POST",
      headers: { ...headers, "Content-Type": "application/json" },
      body: JSON.stringify(payload),
    });

    if (response.status === 429) {
      const data = await response.json();
      const retryAfter = data.error?.retry_after ?? 2 ** attempt;
      const jitter = Math.random();
      const waitTime = (retryAfter + jitter) * 1000;
      console.log(`Rate limited. Retrying in ${(waitTime / 1000).toFixed(1)}s`);
      await new Promise((r) => setTimeout(r, waitTime));
      continue;
    }

    if (!response.ok) throw new Error(`HTTP ${response.status}`);
    return response.json();
  }

  throw new Error("Max retries exceeded");
}

Best Practices

Monitor remaining requests

Track the X-RateLimit-Remaining header and start throttling proactively when you approach your limit, rather than waiting for 429 errors.

Use request queuing

Implement a request queue with a token bucket or leaky bucket algorithm to smooth out request bursts and stay within your per-minute limits.

Cache responses

Cache API responses for identical or similar queries. OrchestrAI responses include a cache_key field you can use for deduplication.

Use batch endpoints

Where available, use batch endpoints to send multiple operations in a single request. Batch requests count as one request against your rate limit regardless of the number of items in the batch.

Next Steps

Webhooks →
Set up real-time notifications for rate limit events and other system alerts.
SDKs & Libraries →
Our official SDKs handle rate limiting and retries automatically.