Documentation

Rate Limits

Rate limits protect the API from abuse and ensure fair usage across all customers. Limits are applied per API key on a sliding window basis.

Rate Limits per Plan

Rate limits are applied per API key on a rolling 1-hour window. When you exceed your limit, the API returns a 429 Too Many Requests response until the window resets.

PlanRequests / HourRequests / MinuteConcurrent Requests
Free100102
Starter1,0005010
Pro10,00050050
EnterpriseUnlimitedCustomCustom

Enterprise customers receive custom rate limits based on their contract. Contact sales to discuss your needs.

Rate Limit Headers

Every API response includes headers that tell you your current rate limit status. Use these headers to implement proactive throttling in your application.

HeaderDescription
X-RateLimit-LimitMaximum number of requests allowed in the current window.
X-RateLimit-RemainingNumber of requests remaining in the current window.
X-RateLimit-ResetUnix timestamp (seconds) when the rate limit window resets.
Retry-AfterSeconds to wait before retrying (only included in 429 responses).

Example Response Headers

HTTP Headers
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1707933600
Content-Type: application/json

Handling 429 Responses

When you exceed your rate limit, the API returns a 429 status code. Your application should handle this gracefully using exponential backoff with jitter. Here are implementations in different languages.

429 Response Body

JSON
{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please retry after 32 seconds.",
    "retry_after": 32
  }
}

Python — Exponential Backoff

python
import time
import random
import requests

def call_api_with_retry(url, headers, payload, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=payload)

        if response.status_code == 429:
            retry_after = response.json().get("error", {}).get("retry_after", 2 ** attempt)
            jitter = random.uniform(0, 1)
            wait_time = retry_after + jitter
            print(f"Rate limited. Retrying in {wait_time:.1f}s (attempt {attempt + 1})")
            time.sleep(wait_time)
            continue

        response.raise_for_status()
        return response.json()

    raise Exception("Max retries exceeded")

JavaScript / TypeScript — Exponential Backoff

typescript
async function callApiWithRetry(
  url: string,
  headers: Record<string, string>,
  payload: object,
  maxRetries = 5
) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, {
      method: "POST",
      headers: { ...headers, "Content-Type": "application/json" },
      body: JSON.stringify(payload),
    });

    if (response.status === 429) {
      const data = await response.json();
      const retryAfter = data.error?.retry_after ?? 2 ** attempt;
      const jitter = Math.random();
      const waitTime = (retryAfter + jitter) * 1000;
      console.log(`Rate limited. Retrying in ${(waitTime / 1000).toFixed(1)}s`);
      await new Promise((r) => setTimeout(r, waitTime));
      continue;
    }

    if (!response.ok) throw new Error(`HTTP ${response.status}`);
    return response.json();
  }

  throw new Error("Max retries exceeded");
}

Best Practices

Monitor remaining requests

Track the X-RateLimit-Remaining header and start throttling proactively when you approach your limit, rather than waiting for 429 errors.

Use request queuing

Implement a request queue with a token bucket or leaky bucket algorithm to smooth out request bursts and stay within your per-minute limits.

Cache responses

Cache API responses for identical or similar queries. OrchestrAI responses include a cache_key field you can use for deduplication.

Use batch endpoints

Where available, use batch endpoints to send multiple operations in a single request. Batch requests count as one request against your rate limit regardless of the number of items in the batch.

Next Steps

  • Webhooks →

    Set up real-time notifications for rate limit events and other system alerts.

  • SDKs & Libraries →

    Our official SDKs handle rate limiting and retries automatically.