A COO-layer assessment of the Novus Forward Agent Spec — built after the COMMAND + TAC architecture it closely mirrors.
The spec divides into a Dev division (engineering: build, deploy, review) and an Ops division (marketing, publishing, comms). Dev ≈ TAC. Ops ≈ COMMAND. The difference is topology: they run both identities in one session via prompt folders. We run separate sessions with a structured handoff protocol (dispatch-tac.sh).
Their approach is simpler to install. Ours has less context bleed on complex missions.
| Pattern | Their Implementation | Our Implementation |
|---|---|---|
| Council of 5 | Pass / Flag / Block voting. One block halts. Any flag triggers revision before proceeding. | Same seats. Flag surfaces and notes dissent — doesn't hard-stop reversible ops. |
| 10-Attempt Rule | Different method class each retry. Opus diagnosis at attempt 9. | Identical. Called "Intelligence Surge" internally. |
| EYES Protocol | 5 types: HTTP 200, /health, file, cron, API 2xx. All curl-based. | Curl + Wizard (Chrome DevTools MCP) for SPAs. SPA gate is mandatory. |
| Hook Enforcement | 5 rules including "no external AI without permission." | Similar set. External AI restriction removed — too conservative for subagent dispatch. |
| Memory | Flat files committed to GitHub. Cross-session via git pull on boot. | Flat files + ruvector.db. Semantic search via cmd-search.sh. |
| Parallel Dispatch | Not addressed. No CAMPAIGN law. | Explicit Iron Law. Sequential dispatch on independent targets = command failure. |
| Memory Confidence | Tagged: confirmed / observed / inferred. | Flat — all observations treated equally. Gap worth closing. |
| Employee Onboarding | First-session interview writes personalized prompt file. | Not implemented. Worth adding for multi-operator scenarios. |
Council of 5, 10-attempt rule with Opus at attempt 9, hook-enforced guardrails, skills as on-demand (names at startup, bodies on invoke), session lifecycle with pull-on-boot. All solid. Independent validation that these patterns hold under real use.
Deliberately avoid custom servers and databases until a concrete bottleneck demands it. Good discipline — worth encoding as a rule when scoping future COMMAND infrastructure work.
Their memory entries carry a tag: confirmed / observed / inferred. Our observation log treats all entries equally. An agent knowing whether a fact was directly verified vs inferred from context is meaningfully different — especially when contradictory entries appear across sessions.
First session runs a structured interview: directness preference, decision authority level, interaction style. Output is a personalized prompt file that loads alongside the division prompt each session. Useful if COMMAND is ever opened to additional operators beyond Toby.
All five verification types rely on curl. A React, Next.js, or Vite app returns HTTP 200 on an HTML shell containing one script tag. The actual application never runs. curl 200 ≠ EYES.
The fix: SPA gate before any verification. curl -s <url> | grep -c 'id="root"\|id="__next"' → if ≥1, Wizard is mandatory — navigate, screenshot, read the PNG. Their spec will silently pass broken deploys of any modern frontend.
When a mission has N independent targets — build 10 sites, audit 20 pages, send 15 drafts — processing them sequentially is a performance failure, not a feature. We encoded this as an Iron Law: count all targets, run independence check, fan out in parallel before item 1 starts. Sequential dispatch on independent work is a command failure. They'll hit this.
Running Dev and Ops in the same session via prompt folders works at low complexity. Under sustained parallel workloads — debugging a deploy failure while managing a content calendar — context from one division bleeds into the other. Separate sessions with structured handoff keeps each clean. The overhead is worth it at scale.
Blocking subagent dispatch via a hook is too conservative for a dev agent. Council of 5 votes, Intelligence Surge, parallel exploration — all require spawning subagents autonomously. A permission gate on every external AI call kills the operational speed these systems are built to provide. The right guardrail is spend ceiling + destructive-op confirmation, not a blanket external-AI block.
Correct for irreversible actions: send, publish, delete. Overkill for reversible ops: branch, draft, scaffold. A hard-stop on every single flag vote slows routine work for no gain. Gate by reversibility, not vote type.
The bones are solid — same architecture we independently built and battle-tested. Council, 10-attempt rule, hook enforcement, session lifecycle. All hold up. The memory confidence tagging and per-employee onboarding are worth stealing back.
The SPA gate gap is the one production defect. An EYES protocol that can't distinguish a raw HTML response from a rendered React application will silently pass broken deployments. That gap surfaces in production, not in testing.
Ship the SPA gate fix and the CAMPAIGN law, cut the external-AI block and the flag-always-halts rule, and it's close to production-grade.