Universal LLM Proxy

AI Gateway

One API for every major LLM provider. AI Gateway gives you an OpenAI-compatible endpoint that routes to OpenAI, Anthropic, Google Gemini, Mistral, Groq, Together AI, DeepSeek, and Ollama. Automatic fallback chains, real-time cost tracking, semantic response caching, and budget enforcement — all through a single API key. Drop it into any existing OpenAI integration and instantly access every model.

Start Building View Documentation

Key Features

Everything you need to integrate AI Gateway into your production systems.

Universal Provider Access

Route to OpenAI, Anthropic, Google Gemini, Mistral, Groq, Together AI, DeepSeek, and Ollama through a single OpenAI-compatible API. Switch models by changing one parameter — no code changes required.

Automatic Fallback & Retry

Define fallback chains per model. If the primary provider fails or times out, AI Gateway automatically retries with exponential backoff and falls back to the next provider in the chain. Zero downtime for your application.

Real-Time Cost Tracking

Every API call is tracked with precise token counts and cost calculations per provider. Set monthly budget limits per user or team. View spend breakdowns by provider, model, and time period.

Semantic Response Caching

Exact-match and semantic caching powered by pgvector. Similar prompts return cached responses instantly, cutting costs and latency. TTL-based expiry keeps cache fresh. Cache hit/miss reported in response headers.

API Endpoints

Production-ready REST API endpoints. All requests require a valid API key in the Authorization header.

POST

/api/v1/gateway/chat/completions

OpenAI-compatible chat completions. Send messages array with model name and get back a response in standard OpenAI format. Supports streaming via SSE. Works with any OpenAI SDK or library.

POST

/api/v1/gateway/embeddings

Generate embeddings using any supported provider. Returns vectors in OpenAI-compatible format. Supports text-embedding-3-small, text-embedding-3-large, and provider-specific embedding models.

GET

/api/v1/gateway/models

List all available models across all configured providers with pricing information, context window sizes, and current availability status.

GET

/api/v1/gateway/usage

Get your LLM spend breakdown by provider, model, and time period. Includes total cost, token counts, cache hit rates, and average latency per provider.

GET

/api/v1/gateway/health

Check the health and availability of all configured LLM providers. Returns status, latency, and error rates for each provider.

Example Request

curl

curl -X POST \
  https://bolor-intelligence.com/api/api/v1/gateway/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
  "query": "Your input here",
  "options": {
    "max_latency_ms": 5000,
    "min_confidence": 0.8
  }
}'

Use Cases

See how teams are using AI Gateway in production today.

Multi-Provider Resilience

Production applications use AI Gateway to eliminate single-provider dependency. When OpenAI has an outage, traffic automatically falls back to Anthropic or Google — users never notice. Teams report 99.99% effective uptime across providers.

Cost Optimization

Engineering teams route different workloads to the most cost-effective provider. Simple queries go to Groq for speed, complex reasoning to Claude, and embeddings to OpenAI. Teams typically save 40-60% on LLM costs compared to single-provider usage.

OpenAI SDK Drop-In Replacement

Teams using the OpenAI Python or JS SDK point their base URL to AI Gateway and instantly gain access to every provider. No code changes, no new SDKs to learn. Existing tools like LangChain and LlamaIndex work out of the box.

Start Building with AI Gateway

Get your API key and make your first call in under 5 minutes. Free tier includes 100 requests per hour.

Get Started Free Read the Docs