LiteLLM Proxy API

The godotz.ai LiteLLM Proxy runs at http://localhost:4000 by default and exposes an OpenAI-compatible REST API. All agent traffic is routed through this gateway for routing, budget enforcement, and observability.

Base URL

http://localhost:4000

For fleet deployments the proxy is reachable via Tailscale at http://<node-hostname>:4000.

Authentication

All requests require a virtual key passed as a Bearer token.

Authorization: Bearer sk-omp-<your-virtual-key>

Virtual keys are created per team or agent role. The master key (LITELLM_MASTER_KEY env var) bypasses all budget limits and is for admin operations only.

Endpoints

POST /chat/completions

OpenAI-compatible chat completion. Accepts the full OpenAI request schema.

curl -X POST http://localhost:4000/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-omp-executor-key" \
  -d '{
    "model": "glm-5.1",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize the last build report."}
    ],
    "temperature": 0.3,
    "max_tokens": 1024
  }'

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1717800000,
  "model": "glm-5.1",
  "choices": [
    {
      "index": 0,
      "message": {"role": "assistant", "content": "..."},
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 128,
    "total_tokens": 170
  }
}

godotz.ai model aliases:

Alias	Routes to	Use case
`glm-5.1`	z.ai GLM-5.1-flash	Default executor
`glm-4.5-air`	z.ai GLM-4.5-Air	Routing, tagging
`claude-opus`	Anthropic claude-opus-4-6	Orchestrator
`claude-sonnet`	Anthropic claude-sonnet-4-5	Standard tasks

GET /models

List all available models registered with the proxy.

curl http://localhost:4000/models \
  -H "Authorization: Bearer sk-omp-executor-key"

Response:

{
  "object": "list",
  "data": [
    {"id": "glm-5.1", "object": "model", "created": 1717000000, "owned_by": "z.ai"},
    {"id": "glm-4.5-air", "object": "model", "created": 1717000000, "owned_by": "z.ai"},
    {"id": "claude-opus", "object": "model", "created": 1717000000, "owned_by": "anthropic"}
  ]
}

GET /health

Liveness probe. Returns 200 when the proxy is running.

curl http://localhost:4000/health

Response:

{"status": "healthy"}

GET /health/readiness

Readiness probe. Returns 200 only when all configured model providers are reachable.

curl http://localhost:4000/health/readiness

Team Budgets

Budgets are defined in config.yaml and enforced per virtual key.

# litellm-config/config.yaml
litellm_settings:
  budget_duration: "30d"

teams:
  - team_id: executor-pool
    budget_limit: 50.00
    models: ["glm-5.1", "glm-4.5-air"]
  - team_id: orchestrators
    budget_limit: 200.00
    models: ["claude-opus", "claude-sonnet"]

When a team exceeds its budget, subsequent requests return 429 Budget exceeded.

Error Codes

Status	Meaning
`401`	Missing or invalid API key
`429`	Rate limit or budget exceeded
`502`	Upstream provider unreachable
`503`	Proxy overloaded — retry with backoff

LiteLLM Proxy API

Base URL

Authentication

Endpoints

POST /chat/completions

GET /models

GET /health

GET /health/readiness

Team Budgets

Error Codes

Related