LLM-Gateway

LiteLLM Gateway

Universal LLM-Bridge für 600+ Modelle

Eine API. Sechshundert Modelle. Failover, Retry, Cost-Routing.

Beschreibung

LiteLLM ist die zentrale Inferenz-Schicht. Single API-Surface für Anthropic, OpenAI, OpenRouter, NVIDIA NIM, Ollama, Replicate, Together, Groq, Cerebras, Mistral, Cohere, Google, Bedrock, Azure, Fireworks, DeepSeek, Qwen. Cost-Routing: lokal-first, Failover-Chain, Per-User-Limits, Per-Team-Budgets. Audit-Logs in audit_events. Rate-Limiting, Caching, Streaming.

Features & Capabilities · 8 Punkte

F01

600+ Modelle über 30+ Provider via OpenAI-kompatibler API

F02

Failover-Chain: gemma3:27b-local -> qwen2.5:32b -> llama3.3:70b -> sonnet-4-6 -> haiku-4-5 -> opus-4-7 -> openrouter/auto

F03

Per-User-Token-Budgets, Per-Team-Limits, Spend-Reports

F04

Caching via Redis (semantic + exact)

F05

Streaming + Function-Calling + Vision-Inputs

F06

Guardrails-Hooks: Presidio PII, NeMo, LLM-Guard pre/post

F07

OpenTelemetry-Tracing zu Langfuse

F08

Auto-Model-Update: pullt taeglich neue HF/OpenRouter-Releases

API & Endpoints

https://llm.ben-e-fit.ai/ui

API

https://llm.ben-e-fit.ai/v1/chat/completions

Models

https://llm.ben-e-fit.ai/v1/models

Use-Cases · 4 Beispiele

Wie das Tool real genutzt wird.

Cowork-AI nutzt sonnet-4-6 als Default, faellt zu opus-4-7 bei reasoning-heavy

NemoClaw nutzt gemma3:27b lokal -- gratis Inferenz auf eigener GPU

OpenClaw nutzt openrouter/anthropic/claude-opus-4 für Plan-Updates

Engineer testet via curl https://litellm.{domain}/v1/chat/completions