LiteLLM Proxy

The universal interface for Large Language Models.

Overview

LiteLLM is a high-performance, FastAPI-based proxy that allows you to call 100+ LLM APIs using the OpenAI format. It simplifies the process of integrating multiple AI providers and provides a centralized place for model management and cost tracking.

Key Features

FastAPI Powered: Built on FastAPI for high performance and easy extensibility.
Unified API: Call OpenAI, Anthropic, Azure, Google, and more using a single, consistent API.
Model Fallbacks: Automatically switch to a backup model if the primary provider is down.
Load Balancing: Distribute requests across multiple API keys or regions.
Cost Tracking: Monitor usage and spend across all your LLM providers.
Caching: Improve performance and reduce costs by caching common AI responses.

How it Works

Request: The LibreApps Desktop frontend or a backend service sends a request to the LiteLLM proxy using the OpenAI SDK format.
Translation: LiteLLM translates the request into the specific format required by the target provider (e.g., Anthropic's Claude API).
Execution: The provider processes the request and returns a response.
Normalization: LiteLLM normalizes the response back into the OpenAI format and returns it to the client.

Configuration

LiteLLM is typically configured using a config.yaml file:

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY
  - model_name: claude-3
    litellm_params:
      model: anthropic/claude-3-opus-20240229
      api_key: os.environ/ANTHROPIC_API_KEY

Best Practices

✅ Do this: Use LiteLLM to avoid vendor lock-in and easily experiment with new models.
✅ Do this: Enable caching for frequently asked questions to save time and money.
❌ Don't do this: Expose the LiteLLM proxy directly to the public internet; keep it behind the AI Chat Server.

Overview​

Key Features​

How it Works​

Configuration​

Best Practices​

Overview

Key Features

How it Works

Configuration

Best Practices