Universal LLM Proxy

AI Gateway

One API for every major LLM provider. AI Gateway gives you an OpenAI-compatible endpoint that routes to OpenAI, Anthropic, Google Gemini, Mistral, Groq, Together AI, DeepSeek, and Ollama. Automatic fallback chains, real-time cost tracking, semantic response caching, and budget enforcement — all through a single API key.

Key Features

Everything you need to integrate AI Gateway into your production systems.

Universal Provider Access

Route to OpenAI, Anthropic, Google Gemini, Mistral, Groq, Together AI, DeepSeek, and Ollama through a single OpenAI-compatible API. Switch models by changing one parameter.

Automatic Fallback & Retry

Define fallback chains per model. If the primary provider fails, AI Gateway automatically retries with exponential backoff and falls back to the next provider. Zero downtime.

Real-Time Cost Tracking

Every API call is tracked with precise token counts and cost calculations per provider. Set monthly budget limits per user or team. View spend breakdowns by provider and model.

Semantic Response Caching

Exact-match and semantic caching powered by pgvector. Similar prompts return cached responses instantly, cutting costs and latency. TTL-based expiry keeps cache fresh.

API Reference

Production-ready REST API. All requests require a valid API key via Authorization header.

POST
/api/v1/gateway/chat/completions

OpenAI-compatible chat completions. Send messages with any model name and get back a response in standard OpenAI format. Supports streaming via SSE.

POST
/api/v1/gateway/embeddings

Generate embeddings using any supported provider. Returns vectors in OpenAI-compatible format.

GET
/api/v1/gateway/models

List all available models across all configured providers with pricing information, context window sizes, and availability status.

GET
/api/v1/gateway/usage

Get your LLM spend breakdown by provider, model, and time period. Includes total cost, token counts, cache hit rates, and average latency.

GET
/api/v1/gateway/health

Check the health and availability of all configured LLM providers. Returns status, latency, and error rates for each provider.

Terminal
curl -X POST https://api.bolorintelligence.com/api/v1/gateway/chat/completions \
  -H "Authorization: Bearer bolor_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms"}
  ],
  "temperature": 0.7
}'

Use Cases

See how teams are using AI Gateway in production today.

01

Multi-Provider Resilience

Production applications use AI Gateway to eliminate single-provider dependency. When OpenAI has an outage, traffic automatically falls back to Anthropic or Google — users never notice.

02

Cost Optimization

Engineering teams route different workloads to the most cost-effective provider. Simple queries go to Groq for speed, complex reasoning to Claude, and embeddings to OpenAI. Teams save 40–60% on LLM costs.

03

OpenAI SDK Drop-In

Teams using the OpenAI SDK point their base URL to AI Gateway and instantly gain access to every provider. No code changes. LangChain and LlamaIndex work out of the box.

Start Building with AI Gateway

Get your API key and make your first call in under 5 minutes. Free tier includes 100 API calls per month.