Skip to main content

LiteLLM Proxy

The universal interface for Large Language Models.

Overview

LiteLLM is a high-performance, FastAPI-based proxy that allows you to call 100+ LLM APIs using the OpenAI format. It simplifies the process of integrating multiple AI providers and provides a centralized place for model management and cost tracking.

Key Features

  • FastAPI Powered: Built on FastAPI for high performance and easy extensibility.
  • Unified API: Call OpenAI, Anthropic, Azure, Google, and more using a single, consistent API.
  • Model Fallbacks: Automatically switch to a backup model if the primary provider is down.
  • Load Balancing: Distribute requests across multiple API keys or regions.
  • Cost Tracking: Monitor usage and spend across all your LLM providers.
  • Caching: Improve performance and reduce costs by caching common AI responses.

How it Works

  1. Request: The LibreApps Desktop frontend or a backend service sends a request to the LiteLLM proxy using the OpenAI SDK format.
  2. Translation: LiteLLM translates the request into the specific format required by the target provider (e.g., Anthropic's Claude API).
  3. Execution: The provider processes the request and returns a response.
  4. Normalization: LiteLLM normalizes the response back into the OpenAI format and returns it to the client.

Configuration

LiteLLM is typically configured using a config.yaml file:

model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-3
litellm_params:
model: anthropic/claude-3-opus-20240229
api_key: os.environ/ANTHROPIC_API_KEY

Best Practices

  • Do this: Use LiteLLM to avoid vendor lock-in and easily experiment with new models.
  • Do this: Enable caching for frequently asked questions to save time and money.
  • Don't do this: Expose the LiteLLM proxy directly to the public internet; keep it behind the AI Chat Server.