Fleet Setup
This guide walks through bootstrapping a production-ready godotz.ai fleet: a Docker Compose control plane on a dedicated host, worker nodes provisioned from Nix flakes, beads initialized for DAG task tracking, and ntfy wired for push alerts.
Prerequisites
- One control-plane host (≥ 4 vCPU, 8 GB RAM, 40 GB disk)
- One or more worker nodes (can be VMs, bare metal, or Raspberry Pi 5)
- Tailscale installed and authenticated on every node (see Multi-Node Networking)
- Docker + Docker Compose v2 on the control-plane host
1. Control Plane — Docker Compose
The control plane runs LiteLLM Proxy, Postgres, Redis, Temporal, and Langfuse as a single compose stack.
git clone https://github.com/omp-team/omp-fleet.git
cd omp-fleet/control-plane
cp .env.example .env.secrets
Edit .env.secrets before first boot (see Secret Management):
# .env.secrets — never commit this file
POSTGRES_PASSWORD=change-me-strong
REDIS_PASSWORD=change-me-strong
LITELLM_MASTER_KEY=sk-master-change-me
LANGFUSE_SECRET_KEY=lf-secret-change-me
ANTHROPIC_API_KEY=sk-ant-...
ZAI_API_KEY=...
NTFY_TOPIC=omp-fleet-alerts
Start the stack:
docker compose --env-file .env.secrets up -d
Verify all services are healthy:
docker compose ps
# litellm running (healthy)
# postgres running (healthy)
# redis running (healthy)
# temporal running (healthy)
# langfuse running (healthy)
2. Baseline Snapshots
Before connecting worker nodes, capture a clean baseline snapshot of the control-plane state. This lets you roll back without reprovisioning.
# Stop writes, snapshot postgres volume
docker compose exec postgres pg_dump -U omp omp_db > backups/baseline-$(date +%F).sql
# Snapshot Redis (optional — mostly cache)
docker compose exec redis redis-cli BGSAVE
# Tag the compose stack state
git tag baseline-$(date +%F)
Schedule nightly snapshots with cron:
# /etc/cron.d/omp-snapshots
0 3 * * * root /opt/omp-fleet/scripts/snapshot.sh >> /var/log/omp-snapshot.log 2>&1
snapshot.sh should run the pg_dump above and rotate backups older than 14 days.
3. .env.secrets
.env.secrets is the single source of truth for all credentials. It is loaded by Docker Compose and must never be committed to version control.
# .gitignore entry (enforced by fleet template)
.env.secrets
*.secrets
Recommended structure:
# === Provider API Keys ===
ANTHROPIC_API_KEY=sk-ant-...
ZAI_API_KEY=...
GEMINI_API_KEY=...
# === Infrastructure ===
POSTGRES_PASSWORD=
REDIS_PASSWORD=
LITELLM_MASTER_KEY=sk-master-...
TEMPORAL_AUTH_TOKEN=
# === Observability ===
LANGFUSE_SECRET_KEY=
LANGFUSE_PUBLIC_KEY=
NTFY_TOPIC=omp-fleet-alerts
NTFY_URL=https://ntfy.sh
# === Node Identity ===
NODE_NAME=control-01
TAILSCALE_AUTHKEY=tskey-auth-...
For teams graduating to full secret management, see Security Gates — sops-nix prep.
4. beads Init
beads is godotz.ai’s DAG task-tracking system. Initialize it once the control plane is healthy.
# Install beads CLI (included in omp-fleet devShell)
nix develop github:omp-team/omp-fleet#tools
# Point beads at the Temporal endpoint
export TEMPORAL_HOST=localhost:7233
export TEMPORAL_NAMESPACE=omp-default
# Initialize namespace and task schemas
bd init --namespace omp-default --schema omp-fleet/schemas/beads-schema.yaml
# Verify connectivity
bd status
# ✓ Temporal: connected
# ✓ Namespace: omp-default
# ✓ Schema: v2.4 loaded
Register the first worker node:
bd node register \
--name worker-01 \
--tailscale-ip 100.64.0.2 \
--roles "glm-worker,kg-writer"
5. ntfy Notifications
ntfy delivers real-time push alerts for fleet events: node failures, budget overruns, task completions, and security gate blocks.
# config/notifications.yml
ntfy:
url: "${NTFY_URL}"
topic: "${NTFY_TOPIC}"
events:
node_down:
priority: urgent
tags: [warning, computer]
budget_exceeded:
priority: high
tags: [money_with_wings]
task_failed:
priority: default
tags: [x]
task_completed:
priority: min
tags: [white_check_mark]
security_gate_blocked:
priority: urgent
tags: [lock, warning]
Subscribe on your phone or desktop:
# Terminal subscription (for testing)
curl -s https://ntfy.sh/omp-fleet-alerts/json | jq .
# Send a test alert
curl -d "Fleet control plane started" \
-H "Title: OMP Fleet" \
-H "Priority: default" \
https://ntfy.sh/omp-fleet-alerts
6. Health Check
Run the fleet health check after initial setup:
bd fleet health
# Node Status Last Seen Roles
# control-01 healthy 0s ago control-plane
# worker-01 healthy 3s ago glm-worker, kg-writer
Next Steps
- Multi-Node Networking — Tailscale mesh and ACL policy
- Model Routing — Assign models to roles
- Security Gates — Harden secrets and supply chain