godotz.ai: Product Requirements Document

Version: 1.2 | Date: 2026-06-08 | Status: Active
Authors: godotz.ai Architecture Team
Classification: Internal / Open Source Reference

1. Problem Statement

1.1 The Fragmentation Problem

The AI agent tooling landscape in 2026 is deeply fragmented. Teams running autonomous agent workflows face four compounding failure modes:

Single-model echo chambers — Most frameworks default to one model family for all roles (orchestrator, critic, executor). When the orchestrator and the reviewer share the same weights, systematic biases amplify rather than cancel.
No fleet orchestration primitive — Existing tools assume a single machine. Running agents across heterogeneous OS nodes (Linux workstations, ARM edge devices, macOS laptops) requires bespoke glue code that breaks on every framework update.
Opaque cost and budget enforcement — API costs spiral because there is no standard gateway that can enforce per-task, per-team, or per-model budgets before a runaway agent burns the monthly allowance.
Unsafe self-improvement loops — Several projects attempt Darwin-Gödel-style self-modification but lack sandboxed empirical verification gates. Without fail-closed governance, a single bad mutation can corrupt the agent’s own tooling.

1.2 Market Context

By Q2 2026:

Claude Code, Cursor, and Codex each have millions of users but no cross-fleet coordination layer.
LiteLLM has become the de facto API gateway for AI teams but lacks a higher-level orchestration harness.
Temporal’s durable execution pattern is adopted by tech companies but rarely applied to AI agent DAGs.
The ReConcile paper (2024) showed that heterogeneous model panels reduce systematic error by 31% vs single-model panels, yet most frameworks ignore this finding.

1.3 Opportunity

godotz.ai fills the gap between raw model APIs and production-grade autonomous agent fleets. It provides the orchestration harness, not the models — a neutral layer any team can deploy on existing infrastructure.

2. Product Vision

godotz.ai is the autonomous multi-agent harness for heterogeneous OS fleets. It treats agent compute like infrastructure: declarative, reproducible, observable, and self-healing.

Core promise: Define your swarm in YAML. godotz.ai handles routing, scheduling, memory, cost, and verification — across every node in your fleet.

3. Target Users

3.1 Primary Audience

Persona	Description	Key Pain Point
AI Engineer	Builds autonomous pipelines, LLM applications	Framework churn, echo chambers, no fleet primitive
ML Ops Engineer	Operates model inference infrastructure	Cost overruns, no unified gateway, observability gaps
DevOps / Platform Engineer	Manages heterogeneous OS fleets	No agent-native scheduler, bespoke glue code
System Architect	Designs multi-agent architectures	Lack of durability, no standard memory layer, security gaps

3.2 Secondary Audience

Persona	Description
Research Engineer	Runs long-horizon autonomous research agents
Security Engineer	Audits AI supply chains, plugin provenance
Product Manager	Needs PRD + dashboard to track agent throughput and cost
Open Source Contributor	Extends godotz.ai with new agents, skills, or MCP servers

4. User Stories

4.1 Core Infrastructure (AI Engineer)

ID	As a…	I want to…	So that…
US-001	AI Engineer	Define a swarm in a single YAML file	I can version-control my agent topology
US-002	AI Engineer	Route tasks to different model families based on role	My orchestrator and critic never share the same weights
US-003	AI Engineer	Retry failed agent tasks with exponential backoff	Transient API errors don’t abort long-running workflows
US-004	AI Engineer	Inspect the full DAG of agent tasks in real time	I can debug which node is blocking a swarm
US-005	AI Engineer	Persist agent memory across sessions	Long-horizon tasks don’t lose context between runs

4.2 Fleet Operations (ML Ops / DevOps)

ID	As a…	I want to…	So that…
US-006	ML Ops Engineer	Set a monthly budget cap per model family	Runaway agents can’t burn the monthly quota
US-007	ML Ops Engineer	See per-agent, per-task token consumption	I can attribute costs to specific workflows
US-008	DevOps Engineer	Deploy godotz.ai across Linux + ARM + macOS nodes	My fleet is heterogeneous and I don’t want to manage three tool stacks
US-009	DevOps Engineer	Pin agent versions with Nix flakes	Every node runs identical, reproducible environments
US-010	ML Ops Engineer	Route to a vision fallback model automatically	Agents that receive image inputs don’t fail silently

4.3 Security (Security Engineer)

ID	As a…	I want to…	So that…
US-011	Security Engineer	Scan every plugin before it executes	No supply-chain attack propagates to production
US-012	Security Engineer	Run agent code in a sandboxed environment	A compromised agent can’t exfiltrate secrets
US-013	Security Engineer	Block MCP servers flagged by CVE database	Known vulnerabilities are caught at the gate
US-014	Security Engineer	Audit the full provenance of every tool call	I can reconstruct any incident from Langfuse traces

4.4 Self-Evolution (Advanced AI Engineer)

ID	As a…	I want to…	So that…
US-015	AI Engineer	Allow agents to propose patches to their own prompts	The system improves without manual iteration
US-016	AI Engineer	Gate every self-modification with empirical verification	A bad self-edit is rejected before it reaches production
US-017	AI Engineer	Require human approval for significant architecture changes	I stay in the loop on non-trivial mutations
US-018	AI Engineer	Roll back any self-modification atomically	A bad change is reversible in under 60 seconds

4.5 Knowledge and Memory

ID	As a…	I want to…	So that…
US-019	AI Engineer	Store structured knowledge in a vault with automatic recaps	Agents don’t re-derive the same facts on every run
US-020	AI Engineer	Query a knowledge graph built from my codebase	Agents navigate large repos without exhausting context windows
US-021	AI Engineer	Have agent memories expire based on relevance scoring	The memory system stays lean over time

4.6 Edge Cases

ID	As a…	I want to…	So that…
US-022	AI Engineer	Handle agent node failures gracefully	A single node going offline doesn’t abort the swarm
US-023	ML Ops Engineer	Enforce strict rate limits per model	One agent can’t starve all others on the same gateway
US-024	Security Engineer	Toggle “hardcore mode” to disable permissive fallbacks	High-security environments get fail-closed behavior end-to-end
US-025	DevOps Engineer	Monitor fleet health with a live status dashboard	I know the health of every node without SSH-ing in

5. Functional Requirements

5.1 L0 — Substrate Layer (Nix + Tailscale)

Req ID	Requirement
FR-L0-001	System SHALL provide Nix flake definitions for all supported platforms (x86_64-linux, aarch64-linux, aarch64-darwin)
FR-L0-002	System SHALL establish encrypted mesh connectivity between all fleet nodes via Tailscale
FR-L0-003	System SHALL support Komodo-based deployment orchestration for fleet node lifecycle management
FR-L0-004	System SHALL ensure reproducible builds; two nodes built from the same flake SHALL produce bit-identical environments

5.2 L1 — Transport Layer (NATS)

Req ID	Requirement
FR-L1-001	System SHALL use NATS JetStream for all inter-agent message passing
FR-L1-002	System SHALL support at-least-once delivery semantics with configurable acknowledgement timeouts
FR-L1-003	System SHALL provide NATS subject namespacing per fleet node and per agent role

5.3 L2 — Model Gateway (LiteLLM Proxy)

Req ID	Requirement
FR-L2-001	System SHALL expose a unified OpenAI-compatible API endpoint routing to all configured model providers
FR-L2-002	System SHALL enforce per-key budget limits; requests exceeding budget SHALL be rejected with HTTP 429
FR-L2-003	System SHALL cache identical prompts in Redis with configurable TTL
FR-L2-004	System SHALL persist all request/response pairs to Postgres for audit and cost attribution
FR-L2-005	System SHALL support virtual keys mapped to model families (Antigravity keys, GLM keys)
FR-L2-006	System SHALL provide an EMA (Exponential Moving Average) fallback chain when a primary model fails
FR-L2-007	System SHALL route vision-capable requests to gemini-3.1-pro-low when the primary model lacks vision

5.4 L3 — Orchestration Layer (Temporal + LangGraph)

Req ID	Requirement
FR-L3-001	System SHALL use Temporal for durable workflow execution with automatic retry on transient failures
FR-L3-002	System SHALL implement the supervisor-worker LangGraph pattern as the standard 2-tier orchestration model
FR-L3-003	System SHALL support nested subgraph composition for complex multi-agent workflows
FR-L3-004	System SHALL provide workflow versioning so in-flight workflows can complete on the version they started

5.5 L4 — Agent Runtime (godotz.ai/hermes)

Req ID	Requirement
FR-L4-001	System SHALL provide a declarative YAML swarm definition format
FR-L4-002	System SHALL support 128+ skills (pre-built agent capabilities) loadable per swarm
FR-L4-003	System SHALL support 32+ agent types with configurable roles
FR-L4-004	System SHALL support 7 MCP servers exposing 79 tools to agents
FR-L4-005	System SHALL enforce concurrency limits per model family (GLM-5.1: 10, GLM-5-Turbo: 1, GLM-4.7: 2, GLM-4.5-Air: 5)

5.6 L5 — Task Tracking (beads + taskdog)

Req ID	Requirement
FR-L5-001	System SHALL represent agent workflows as DAGs using the beads task primitive
FR-L5-002	System SHALL provide a `bd` CLI for task creation, status, retry, and cancellation
FR-L5-003	System SHALL provide a Gantt/ETA view via taskdog for human operators
FR-L5-004	System SHALL guarantee idempotent task execution; re-running a completed task SHALL be a no-op

5.7 L6 — Memory Layer (Mnemopi + KG + Graphify)

Req ID	Requirement
FR-L6-001	System SHALL provide session-scoped and persistent long-term memory via Mnemopi
FR-L6-002	System SHALL maintain a Knowledge Gardener vault with automatic daily recaps
FR-L6-003	System SHALL build and query a code knowledge graph via Graphify
FR-L6-004	System SHALL score memory relevance and expire low-relevance entries automatically

5.8 Security Requirements

Req ID	Requirement
FR-SEC-001	System SHALL scan all plugins via plugin-eval before execution
FR-SEC-002	System SHALL scan all MCP servers via mcp-scan and block servers flagged by CVE database
FR-SEC-003	System SHALL execute agent code in a fail-closed sandbox
FR-SEC-004	System SHALL trace all tool calls via Langfuse for full audit provenance
FR-SEC-005	System SHALL provide a “hardcore mode” toggle that disables all permissive fallbacks

5.9 Self-Evolution Requirements

Req ID	Requirement
FR-DGM-001	System SHALL implement the Darwin Gödel Machine pattern for self-modification proposals
FR-DGM-002	Every self-modification SHALL be tested in a sandboxed environment before promotion
FR-DGM-003	Significant architecture mutations SHALL require human approval before taking effect
FR-DGM-004	All self-modifications SHALL be atomically reversible within 60 seconds

6. Non-Functional Requirements

6.1 Performance

NFR ID	Requirement
NFR-P-001	Model gateway routing latency SHALL be < 100ms (p99) excluding model inference time
NFR-P-002	Task graph scheduling latency SHALL be < 50ms per node
NFR-P-003	Memory retrieval (Mnemopi) SHALL respond in < 200ms for datasets up to 10,000 entries
NFR-P-004	Knowledge graph queries (Graphify) SHALL return results in < 500ms for repos up to 1,000 files

6.2 Reliability

NFR ID	Requirement
NFR-R-001	godotz.ai core services SHALL maintain 99.9% uptime (< 8.7h downtime/year)
NFR-R-002	A single node failure SHALL NOT abort in-flight workflows on other nodes
NFR-R-003	All durable workflows SHALL survive process restarts via Temporal persistence

6.3 Security

NFR ID	Requirement
NFR-S-001	All inter-node communication SHALL be encrypted (Tailscale WireGuard)
NFR-S-002	All model API keys SHALL be stored in vault (never plaintext in config files)
NFR-S-003	The sandbox SHALL be fail-closed: any sandbox breach attempt SHALL terminate the agent

6.4 Observability

NFR ID	Requirement
NFR-O-001	Every model API call SHALL be traced in Langfuse with full prompt/response
NFR-O-002	Every task SHALL emit lifecycle events (created, started, completed, failed) to NATS
NFR-O-003	Budget consumption SHALL be queryable per virtual key, per agent, per day

6.5 Scalability

NFR ID	Requirement
NFR-SC-001	godotz.ai SHALL support fleet sizes from 1 to 50+ heterogeneous nodes
NFR-SC-002	Total concurrent model calls SHALL scale with fleet size (18 per node baseline)
NFR-SC-003	Knowledge graph SHALL support repositories up to 10,000 files

7. Success Metrics

7.1 Adoption Metrics

Metric	Target (M3)	Target (M6)
Active fleet nodes	3	20+
Swarms defined	10	100+
Daily agent tasks	500	5,000+

7.2 Quality Metrics

Metric	Target
Self-evolution verification pass rate	> 80% of proposals survive sandbox
Echo chamber reduction	> 25% error reduction vs single-model panel (ReConcile baseline)
Budget overrun incidents	0 per month

7.3 Throughput Metrics

Metric	Baseline	Target
Tasks per hour per node	100	500
Memory retrieval accuracy	—	> 90% relevance score
Knowledge graph query hit rate	—	> 85%

7.4 Cost Metrics

Metric	Target
Cost per 1,000 agent tasks	< $2.00 (GLM family)
Cache hit rate (Redis)	> 40% for repeated prompts
Budget enforcement accuracy	100% (zero overruns)

8. Roadmap Phases

Phase M0 — Foundation (Complete)

IA document + scaffold
Core architecture design
Model gateway selection (LiteLLM)
Initial Nix flake definition

Phase M1 — Core Gateway (Complete)

LiteLLM Proxy deployment
Virtual key management
Redis cache integration
Postgres persistence
Basic budget enforcement

Phase M2 — Orchestration (In Progress)

Temporal workflow engine
LangGraph supervisor-worker pattern
beads task graph CLI
NATS JetStream transport

Phase M3 — Memory + Knowledge

Mnemopi session/persistent memory
Knowledge Gardener v0.21.0 vault
Graphify knowledge graph
Automatic daily recaps

Phase M4 — Security + Self-Evolution

plugin-eval gate
mcp-scan integration
Sandboxed execution
Darwin Gödel Machine self-modification loop
Hardcore mode toggle

Phase M5 — Fleet + Observability

Tailscale mesh configuration
Multi-node deployment
Langfuse full tracing
taskdog Gantt dashboard
Status dashboard

Phase M6 — Scale + Polish

1,000+ file knowledge graph support
Mobile/edge node support
Performance tuning (< 100ms routing)
Full documentation suite
Public release

9. Open Questions & Risks

9.1 Open Questions

Q#	Question	Owner	Status
Q-001	Which Temporal SaaS tier or self-hosted version for fleet deployment?	Architecture	Open
Q-002	Should Knowledge Gardener recaps run on a schedule or trigger-based?	ML Ops	Open
Q-003	How to handle model API key rotation without downtime?	Security	Open
Q-004	GLM model availability via z.ai API on ARM nodes?	ML Ops	Investigating

9.2 Risks

Risk ID	Risk	Likelihood	Impact	Mitigation
R-001	z.ai API deprecates GLM-5.1	Medium	High	Abstract model IDs behind virtual keys; swap backend without code changes
R-002	Temporal license change	Low	High	Temporal is MIT + BSL; monitor for changes; keep LangGraph as standalone fallback
R-003	Nix flake complexity deters contributors	Medium	Medium	Provide Docker fallback for quick start; Nix is optional for dev, required for prod fleet
R-004	Sandbox escape via novel agent exploit	Low	Critical	Defense in depth: plugin-eval + sandbox + Tailscale ACL + human approval gates
R-005	LiteLLM API surface changes break gateway	Medium	Medium	Pin LiteLLM version; integration tests on upgrade

10. Appendix: Component Inventory

Layer	Component	Technology	Status
L0	Nix Flake	Nix 2.x	Active
L0	Tailscale Mesh	Tailscale OSS	Active
L0	Komodo Deploy	Komodo	Active
L1	Message Bus	NATS JetStream	Active
L2	Model Gateway	LiteLLM Proxy	Active
L2	Cache	Redis	Active
L2	Persistence	Postgres	Active
L2	Observability	Langfuse	Active
L3	Durable Exec	Temporal	Active
L3	Agent Graph	LangGraph	Active
L4	Harness	godotz.ai/hermes	Active
L5	Task DAG	beads	Active
L5	Human Gantt	taskdog	Active
L6	Session Memory	Mnemopi	Active
L6	Knowledge Vault	Knowledge Gardener v0.21.0	Active
L6	Code KG	Graphify	Active