Presentations and Talks

Primary deck: godotz.ai Architecture — From Single Agent to Heterogeneous Fleet
Source material: HARNESS-SPEC v3.1, godotz.ai Technical Whitepaper, PRD v1.2

Overview

The godotz.ai architecture presentation deck translates the HARNESS-SPEC and whitepaper into a talk-ready format. It is structured for a technical audience familiar with LLM tooling but unfamiliar with multi-agent fleet orchestration. The deck progresses from problem framing through architecture layers to live benchmarks, ending with the roadmap.

Total slide count: 28 slides across 6 sections.

Deck Structure

Section	Slides	Title
1	1–3	The Problem: Single-Agent Limits
2	4–7	godotz.ai Architecture: 13 Components
3	8–13	L0–L6 Layer Model
4	14–19	Model Matrix and Routing
5	20–24	Fleet Topology
6	25–28	Benchmarks and Roadmap

Key Slides

Section 1 — The Problem (Slides 1–3)

Slide 1: Four Failure Modes
Single-model echo chambers, no fleet orchestration primitive, opaque cost governance, unsafe self-modification loops. Each gets a one-sentence diagnosis and a data point (e.g., ReConcile paper: heterogeneous panels reduce bias by ~31%).

Slide 2: What Existing Tools Miss
Side-by-side: Claude Code / Cursor / Codex each have millions of users, none have cross-fleet coordination. LiteLLM has the gateway layer but lacks orchestration. Temporal has durable execution but is rarely applied to agent DAGs.

Slide 3: godotz.ai’s Answer
One sentence per failure mode, mapping to the harness layer that addresses it.

Section 2 — 13-Component Architecture (Slides 4–7)

The 13 core components of godotz.ai, grouped into four columns:

Column	Components
Conversation	Claude Code (Claude, Gemini), godotz.ai TUI
Orchestration	OMC multi-agent layer, Swarm executor, Beads DAG scheduler
Gateway	LiteLLM proxy, Budget enforcer, Semantic cache
Infrastructure	Mnemopi memory, Knowledge Gardener vault, Graphify KG, Temporal durable execution, Redis

Slide 4: Component map diagram — boxes with arrows showing data flow from user prompt through conversation layer to worker models and back through memory.

Slide 5: Zoom on the Orchestration column — OMC hook lifecycle (SessionStart → UserPromptSubmit → PreToolUse → PostToolUse → Stop), agent roster (32 agents, 8 roles).

Slide 6: Zoom on the Gateway column — LiteLLM routing with per-key budget enforcement, semantic cache hit/miss path, plan cache bypass.

Slide 7: Zoom on the Infrastructure column — Mnemopi (SQLite + WAL, per-project-tagged), Graphify (239 nodes, 284 edges, 22 communities), Knowledge Gardener (vault + auto-recap).

Section 3 — L0–L6 Layer Model (Slides 8–13)

godotz.ai’s architecture is formally described as a six-layer stack (L0–L6), distinct from the 12-layer optimization stack. The layer model describes what each level handles; the optimization stack describes how it is tuned.

Layer	Name	Responsibility
L0	Substrate	OS provisioning, kernel tuning, hardware profile (i9-12900K, RTX 3080, 30Gi RAM)
L1	Gateway	LiteLLM proxy, API key management, budget enforcement, provider routing
L2	Orchestration	OMC hook system, Claude Code plugin layer, skill injection, keyword detection
L3	Execution	Beads DAG scheduler, Temporal durable tasks, swarm worker dispatch
L4	Intelligence	Model routing (cascade + pheromone), semantic cache, plan cache, EMA tracker
L5	Memory	Mnemopi session memory, Graphify knowledge graph, Knowledge Gardener vault
L6	Knowledge	Context hydration (omp-hydrate), context engineering scoring, session recap

Slide 8: Layer stack diagram — vertical stack with L0 at bottom, L6 at top. Arrows show upward data flow (substrate → knowledge) and downward control flow (knowledge → substrate via routing decisions).

Slides 9–13: One slide per layer pair (L0+L1, L2+L3, L4, L5, L6). Each slide: layer name, one-line responsibility, key components, and one measurable property (e.g., L4: pheromone routing 16.8ms, semantic cache 6.1ms).

Section 4 — Model Matrix (Slides 14–19)

Slide 14: Role-to-Model Matrix

Role	Model	Provider	Use case
slow	claude-opus-4-6-thinking:high	Antigravity	Architecture, deep analysis
default	claude-sonnet-4-6-thinking:high	Antigravity	General conversation
smol	gemini-3.1-pro-high:high	Antigravity	Fast lookups
plan	claude-opus-4-6-thinking:high	Antigravity	Strategic planning
task	glm-5.1:xhigh	z.ai	Subagent execution (10 slots)
commit	glm-4.7:xhigh	z.ai	Commit messages (3 slots)
designer	glm-4.5-air:xhigh	z.ai	UI/design tasks (1 slot)

Slide 15: Concurrency Limits
Total concurrent slots: 18. GLM concurrency caps: glm-5.1 (10), glm-4.7 (2), glm-4.5-air (5), glm-5-turbo (1). Why 18: z.ai account-level concurrent request limit.

Slide 16: Cascade Escalation Chain
Diagram: glm-4.5-air → glm-4.7 → glm-5.1 → Antigravity fallback. Threshold: confidence < 0.7 triggers escalation. Cost gradient: cheapest first.

Slides 17–19: Pheromone routing mechanics (ant-colony metaphor, evaporation rate 0.05, slot rebalancing), EMA feedback loop (alpha=0.3, routing signal thresholds), and the haiku/sonnet/opus semantic mapping to GLM tiers.

Section 5 — Fleet Topology (Slides 20–24)

Slide 20: Single-Node Reference Topology
The current deployment is a single Linux workstation acting as a full fleet node. Diagram shows the node with all components co-located: Claude Code TUI, OMC plugins, godotz.ai daemon, LiteLLM proxy, Redis, Temporal worker, Graphify daemon.

Slide 21: Multi-Node Fleet Extension
How the topology scales: each node carries its own HARNESS-SPEC baseline. The daemon roster (~/.claude/daemon/roster.json) and dispatch directory enable cross-node job routing. Herdr integration (herdr-agent-state.sh) reports agent state to a central coordinator via Unix socket.

Slide 22: Swarm Workload Topology
Two workload configurations from swarm.yaml and swarm-advisor-executor.yaml:

Parallel audit (3 agents, all concurrent): analyzer + tester + security → reports
Advisor-executor DAG (5 agents, phased): advisor → [simple/medium/complex workers] → verifier

Slide 23: Model Provider Topology
Two provider tiers in modelProviderOrder: [google-antigravity, zai]. Priority order with fallback chains. Independent swarm CLI restricted to z.ai (Antigravity requires TUI IPC auth broker).

Slide 24: GPU Node Topology
RTX 3080 hosts lfm2-700m:gpu — 4-bit quantized ONNX, offline thinking model. Excluded from subagent dispatch (preservation mode). Role in topology: local reasoning fallback when API latency is unacceptable.

Section 6 — Benchmarks and Roadmap (Slides 25–28)

Slide 25: S-grade benchmark table (8 metrics) — hydration token reduction (95.2%), tool start latency (12ms), pheromone dispatch (16.8ms), semantic cache hit (6.1ms), regression suite (9ms), cascade verification, budget enforcement, plan cache (46% reduction).

Slide 26: A-grade metrics and what improves them — context promotion routing (needs more scoring data), EMA accuracy (needs n≥50 for stable convergence).

Slide 27: Known limitations — 8 items from HARNESS-SPEC Section 25. Presented as honest engineering tradeoffs, not blockers.

Slide 28: Roadmap — 7 items with status. Highlighted: Adaptive Router (partial, cascade + pheromone in place, needs task dispatch integration) and Cross-Session Intelligence (partial, Mnemopi + Graphify in place, needs hindsight server).

Customization

Swapping the project example: Slides 20–22 use omp-playground (18-file TypeScript project). Replace with any project that has a graphify-out/ directory — update the node count, edge count, community count, and swarm report filenames.

Adjusting benchmark claims: All values in Section 6 come directly from HARNESS-SPEC Section 23. If running on different hardware or a different codebase scale, re-run the optimization scripts and update the source table. The presentation pulls values from there.

Audience depth tuning:

Executive (30 min): Sections 1, 2 (overview only), 6 (benchmarks only)
Engineering (60 min): Full deck, pause on Sections 3–4 for Q&A
Workshop (90 min): Full deck + live omp-hydrate and omp-ema-tracker suggest demos

Export Formats

Format	Use case	Notes
PDF (16:9)	Async sharing, archival	Export at 1920×1080; embed fonts
Keynote / PowerPoint	Live delivery	Keep ASCII diagrams as code blocks rendered in monospace
HTML slides (reveal.js)	Web publishing	Use dark theme to match godotz.ai HUD aesthetic
Markdown (this site)	Documentation integration	Deck sections map 1:1 to wiki sections

Code block diagrams (the L0–L6 stack, the swarm DAG, the routing flow) are rendered in pre/code blocks and survive export to all formats without SVG dependency. Do not convert them to images unless the presentation tool cannot render monospace correctly.