Model Routing

godotz.ai routes API calls to different model families based on role. The rule is simple: Antigravity for conversation and orchestration; GLM for swarm workers and bulk tasks. This separation keeps costs predictable, prevents echo chambers, and respects rate limits.


1. The Two Model Families

Antigravity (Conversation / Orchestration)

Anthropic and Google Gemini models used for high-stakes dialogue, planning, and orchestration. These models are capable but expensive — use them for tasks that need reasoning depth.

ModelRole
claude-opus-4-6Orchestrator, Arbiter, complex planning
claude-sonnet-4-6Conversational assistant, code review
gemini-3.1-proMulti-modal input, cross-family Critic

GLM (Swarm Workers)

z.ai models used for high-throughput worker tasks: summarization, extraction, formatting, and bulk processing. Cheap and fast.

ModelConcurrencyBest For
glm-5.110Complex worker tasks, Actor role
glm-5-turbo1Rapid single-shot tasks
glm-4.72Balanced quality/cost
glm-4.5-air5High-volume lightweight tasks

2. LiteLLM Proxy Configuration

All routing is centralized in LiteLLM Proxy. Workers never hold provider API keys — they call the proxy with a virtual key.

# config/litellm-config.yaml
model_list:
  # === Antigravity ===
  - model_name: claude-opus-4-6
    litellm_params:
      model: anthropic/claude-opus-4-6
      api_key: os.environ/ANTHROPIC_API_KEY
      max_tokens: 16384

  - model_name: claude-sonnet-4-6
    litellm_params:
      model: anthropic/claude-sonnet-4-6
      api_key: os.environ/ANTHROPIC_API_KEY
      max_tokens: 8192

  - model_name: gemini-3.1-pro
    litellm_params:
      model: gemini/gemini-3.1-pro
      api_key: os.environ/GEMINI_API_KEY
      max_tokens: 8192

  # === GLM Workers ===
  - model_name: glm-5.1
    litellm_params:
      model: openai/glm-5.1
      api_base: https://open.bigmodel.cn/api/paas/v4/
      api_key: os.environ/ZAI_API_KEY

  - model_name: glm-5-turbo
    litellm_params:
      model: openai/glm-5-turbo
      api_base: https://open.bigmodel.cn/api/paas/v4/
      api_key: os.environ/ZAI_API_KEY

  - model_name: glm-4.7
    litellm_params:
      model: openai/glm-4.7
      api_base: https://open.bigmodel.cn/api/paas/v4/
      api_key: os.environ/ZAI_API_KEY

  - model_name: glm-4.5-air
    litellm_params:
      model: openai/glm-4.5-air
      api_base: https://open.bigmodel.cn/api/paas/v4/
      api_key: os.environ/ZAI_API_KEY

3. Concurrency Limits

Set per-model concurrency on the virtual key, not globally. This prevents any single worker from saturating a model’s rate limit.

# config/litellm-config.yaml (continued)
router_settings:
  routing_strategy: least-busy
  num_retries: 3
  timeout: 120

  model_group_alias:
    glm-workers: [glm-5.1, glm-5-turbo, glm-4.7, glm-4.5-air]
    antigravity: [claude-opus-4-6, claude-sonnet-4-6, gemini-3.1-pro]

  # Per-model concurrency caps
  allowed_fails: 2
  cooldown_time: 60

tpm_limit_policy:
  glm-5.1:
    max_parallel_requests: 10
  glm-5-turbo:
    max_parallel_requests: 1
  glm-4.7:
    max_parallel_requests: 2
  glm-4.5-air:
    max_parallel_requests: 5
  claude-opus-4-6:
    max_parallel_requests: 3
  claude-sonnet-4-6:
    max_parallel_requests: 5
  gemini-3.1-pro:
    max_parallel_requests: 4

4. Virtual Keys per Role

Each agent role gets its own virtual key with model access scoped appropriately:

# Create a virtual key for GLM workers
litellm key generate \
  --alias "glm-worker-key" \
  --models "glm-5.1,glm-5-turbo,glm-4.7,glm-4.5-air" \
  --max_budget 50 \
  --budget_duration "30d"

# Create a virtual key for orchestrators
litellm key generate \
  --alias "orchestrator-key" \
  --models "claude-opus-4-6,claude-sonnet-4-6,gemini-3.1-pro" \
  --max_budget 200 \
  --budget_duration "30d"

Workers are given only the glm-worker-key — they cannot call Antigravity models even if they try.


5. Context Promotion Rules

When a GLM worker encounters a task too complex for its context window or capability, it can request promotion to an Antigravity model. Promotion is gated by beads.

# config/routing.yml
context_promotion:
  enabled: true
  triggers:
    - type: context_overflow
      threshold_tokens: 120000    # GLM-5.1 context limit
      promote_to: claude-sonnet-4-6
    - type: repeated_failure
      failure_count: 3
      promote_to: claude-opus-4-6
    - type: explicit_request
      keyword: "PROMOTE_TO_ORCHESTRATOR"
      promote_to: claude-opus-4-6
  budget_check: true              # block promotion if budget exceeded
  log_promotions: true            # track in Langfuse

Promotion telemetry is visible in Langfuse under the context_promotion tag:

# View recent promotions
langfuse traces --filter tag:context_promotion --last 24h

6. Echo Chamber Prevention

Never use the same model family for both Actor and Critic roles in a pipeline. The canonical godotz.ai pattern:

# Actor (GLM) + Critic (Antigravity) = heterogeneous panel
pipeline = ActorCriticPipeline(
    actor_model="glm-5.1",
    critic_model="claude-sonnet-4-6",  # different family
    arbiter_model="claude-opus-4-6",
)

Using glm-5.1 as both Actor and Critic is an anti-pattern (see Echo Chamber).


Next Steps