Model Routing
godotz.ai routes API calls to different model families based on role. The rule is simple: Antigravity for conversation and orchestration; GLM for swarm workers and bulk tasks. This separation keeps costs predictable, prevents echo chambers, and respects rate limits.
1. The Two Model Families
Antigravity (Conversation / Orchestration)
Anthropic and Google Gemini models used for high-stakes dialogue, planning, and orchestration. These models are capable but expensive — use them for tasks that need reasoning depth.
| Model | Role |
|---|---|
claude-opus-4-6 | Orchestrator, Arbiter, complex planning |
claude-sonnet-4-6 | Conversational assistant, code review |
gemini-3.1-pro | Multi-modal input, cross-family Critic |
GLM (Swarm Workers)
z.ai models used for high-throughput worker tasks: summarization, extraction, formatting, and bulk processing. Cheap and fast.
| Model | Concurrency | Best For |
|---|---|---|
glm-5.1 | 10 | Complex worker tasks, Actor role |
glm-5-turbo | 1 | Rapid single-shot tasks |
glm-4.7 | 2 | Balanced quality/cost |
glm-4.5-air | 5 | High-volume lightweight tasks |
2. LiteLLM Proxy Configuration
All routing is centralized in LiteLLM Proxy. Workers never hold provider API keys — they call the proxy with a virtual key.
# config/litellm-config.yaml
model_list:
# === Antigravity ===
- model_name: claude-opus-4-6
litellm_params:
model: anthropic/claude-opus-4-6
api_key: os.environ/ANTHROPIC_API_KEY
max_tokens: 16384
- model_name: claude-sonnet-4-6
litellm_params:
model: anthropic/claude-sonnet-4-6
api_key: os.environ/ANTHROPIC_API_KEY
max_tokens: 8192
- model_name: gemini-3.1-pro
litellm_params:
model: gemini/gemini-3.1-pro
api_key: os.environ/GEMINI_API_KEY
max_tokens: 8192
# === GLM Workers ===
- model_name: glm-5.1
litellm_params:
model: openai/glm-5.1
api_base: https://open.bigmodel.cn/api/paas/v4/
api_key: os.environ/ZAI_API_KEY
- model_name: glm-5-turbo
litellm_params:
model: openai/glm-5-turbo
api_base: https://open.bigmodel.cn/api/paas/v4/
api_key: os.environ/ZAI_API_KEY
- model_name: glm-4.7
litellm_params:
model: openai/glm-4.7
api_base: https://open.bigmodel.cn/api/paas/v4/
api_key: os.environ/ZAI_API_KEY
- model_name: glm-4.5-air
litellm_params:
model: openai/glm-4.5-air
api_base: https://open.bigmodel.cn/api/paas/v4/
api_key: os.environ/ZAI_API_KEY
3. Concurrency Limits
Set per-model concurrency on the virtual key, not globally. This prevents any single worker from saturating a model’s rate limit.
# config/litellm-config.yaml (continued)
router_settings:
routing_strategy: least-busy
num_retries: 3
timeout: 120
model_group_alias:
glm-workers: [glm-5.1, glm-5-turbo, glm-4.7, glm-4.5-air]
antigravity: [claude-opus-4-6, claude-sonnet-4-6, gemini-3.1-pro]
# Per-model concurrency caps
allowed_fails: 2
cooldown_time: 60
tpm_limit_policy:
glm-5.1:
max_parallel_requests: 10
glm-5-turbo:
max_parallel_requests: 1
glm-4.7:
max_parallel_requests: 2
glm-4.5-air:
max_parallel_requests: 5
claude-opus-4-6:
max_parallel_requests: 3
claude-sonnet-4-6:
max_parallel_requests: 5
gemini-3.1-pro:
max_parallel_requests: 4
4. Virtual Keys per Role
Each agent role gets its own virtual key with model access scoped appropriately:
# Create a virtual key for GLM workers
litellm key generate \
--alias "glm-worker-key" \
--models "glm-5.1,glm-5-turbo,glm-4.7,glm-4.5-air" \
--max_budget 50 \
--budget_duration "30d"
# Create a virtual key for orchestrators
litellm key generate \
--alias "orchestrator-key" \
--models "claude-opus-4-6,claude-sonnet-4-6,gemini-3.1-pro" \
--max_budget 200 \
--budget_duration "30d"
Workers are given only the glm-worker-key — they cannot call Antigravity models even if they try.
5. Context Promotion Rules
When a GLM worker encounters a task too complex for its context window or capability, it can request promotion to an Antigravity model. Promotion is gated by beads.
# config/routing.yml
context_promotion:
enabled: true
triggers:
- type: context_overflow
threshold_tokens: 120000 # GLM-5.1 context limit
promote_to: claude-sonnet-4-6
- type: repeated_failure
failure_count: 3
promote_to: claude-opus-4-6
- type: explicit_request
keyword: "PROMOTE_TO_ORCHESTRATOR"
promote_to: claude-opus-4-6
budget_check: true # block promotion if budget exceeded
log_promotions: true # track in Langfuse
Promotion telemetry is visible in Langfuse under the context_promotion tag:
# View recent promotions
langfuse traces --filter tag:context_promotion --last 24h
6. Echo Chamber Prevention
Never use the same model family for both Actor and Critic roles in a pipeline. The canonical godotz.ai pattern:
# Actor (GLM) + Critic (Antigravity) = heterogeneous panel
pipeline = ActorCriticPipeline(
actor_model="glm-5.1",
critic_model="claude-sonnet-4-6", # different family
arbiter_model="claude-opus-4-6",
)
Using glm-5.1 as both Actor and Critic is an anti-pattern (see Echo Chamber).
Next Steps
- Fleet Setup — Wire virtual keys into
.env.secrets - Security Gates — Budget enforcement and API key handling
- Hardcore Mode — Disable fallbacks, enforce strict routing