LiteLLM Proxy API
The godotz.ai LiteLLM Proxy runs at http://localhost:4000 by default and exposes an OpenAI-compatible REST API. All agent traffic is routed through this gateway for routing, budget enforcement, and observability.
Base URL
http://localhost:4000
For fleet deployments the proxy is reachable via Tailscale at http://<node-hostname>:4000.
Authentication
All requests require a virtual key passed as a Bearer token.
Authorization: Bearer sk-omp-<your-virtual-key>
Virtual keys are created per team or agent role. The master key (LITELLM_MASTER_KEY env var) bypasses all budget limits and is for admin operations only.
Endpoints
POST /chat/completions
OpenAI-compatible chat completion. Accepts the full OpenAI request schema.
curl -X POST http://localhost:4000/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-omp-executor-key" \
-d '{
"model": "glm-5.1",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize the last build report."}
],
"temperature": 0.3,
"max_tokens": 1024
}'
Response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1717800000,
"model": "glm-5.1",
"choices": [
{
"index": 0,
"message": {"role": "assistant", "content": "..."},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 42,
"completion_tokens": 128,
"total_tokens": 170
}
}
godotz.ai model aliases:
| Alias | Routes to | Use case |
|---|---|---|
glm-5.1 | z.ai GLM-5.1-flash | Default executor |
glm-4.5-air | z.ai GLM-4.5-Air | Routing, tagging |
claude-opus | Anthropic claude-opus-4-6 | Orchestrator |
claude-sonnet | Anthropic claude-sonnet-4-5 | Standard tasks |
GET /models
List all available models registered with the proxy.
curl http://localhost:4000/models \
-H "Authorization: Bearer sk-omp-executor-key"
Response:
{
"object": "list",
"data": [
{"id": "glm-5.1", "object": "model", "created": 1717000000, "owned_by": "z.ai"},
{"id": "glm-4.5-air", "object": "model", "created": 1717000000, "owned_by": "z.ai"},
{"id": "claude-opus", "object": "model", "created": 1717000000, "owned_by": "anthropic"}
]
}
GET /health
Liveness probe. Returns 200 when the proxy is running.
curl http://localhost:4000/health
Response:
{"status": "healthy"}
GET /health/readiness
Readiness probe. Returns 200 only when all configured model providers are reachable.
curl http://localhost:4000/health/readiness
Team Budgets
Budgets are defined in config.yaml and enforced per virtual key.
# litellm-config/config.yaml
litellm_settings:
budget_duration: "30d"
teams:
- team_id: executor-pool
budget_limit: 50.00
models: ["glm-5.1", "glm-4.5-air"]
- team_id: orchestrators
budget_limit: 200.00
models: ["claude-opus", "claude-sonnet"]
When a team exceeds its budget, subsequent requests return 429 Budget exceeded.
Error Codes
| Status | Meaning |
|---|---|
401 | Missing or invalid API key |
429 | Rate limit or budget exceeded |
502 | Upstream provider unreachable |
503 | Proxy overloaded — retry with backoff |