# Nyx — Autonomous Multi-Agent Orchestrator

> Context file for LLMs. Paste this (or link it) into a chat so the model has full context on what Nyx is, how it works, where the code lives, and what is being built.

## What Nyx is

Nyx is a self-hosted, autonomous coding/agent orchestrator that wraps Claude Code. You give it a goal; it decomposes the goal, fans out parallel Claude workers to do the work, integrates their output, verifies it, and commits, all unattended. It runs on the operator's own machine under their Claude Code subscription (no per-token API billing). It is meant to run on its own from up-front instructions, with as little human involvement as possible.

- Repo: github.com/ethanashi/nyx (private).
- Built in ~2 weeks; ~127k lines of code, a majority authored autonomously by Nyx itself (commits under "Nyx Worker").
- Stack: Node + TypeScript orchestrator (run via tsx), a Tauri (Rust + SolidJS) desktop app, a web dashboard, and a Go TUI.

## Architecture (the layers)

- Concierge: the chat face (TUI / desktop). Receives operator messages, holds conversation + memory, classifies and routes work. Light model.
- Planner: decomposes a goal into a JSON plan of subtasks (each: title, prompt, owns, returns, writes, mcps, dependsOn). Runs the strongest model. `/plan` and `/poc` (plan-on-crack) hit this.
- Fanout workers: parallel Claude Code subprocess sessions, one per subtask, that edit the working tree. Capped (default 8 per plan, 16 global).
- Synthesizer: integrates worker outputs into one commit. Single-subtask plans skip it and commit in code.
- Verify: end-to-end verification before a plan is "complete" (browser/eyes for UI; build+test code gate for code).
- Moderator layer (new, Phase 6): supervises workers in flight, detects drift (stall / livelock / blocked), and resolves it autonomously. Never blocks the operator.
- Queue + Autopilot: a durable work queue; the autopilot dispatches pending items one at a time (FIFO, sleep-friendly).

Mental model: concierge holds the task ledger (wide, low detail); a moderator owns one objective (medium detail); a worker does one task (deep). Shared memory holds durable facts. No single layer holds everything.

## Control plane

The moderator runs an HTTP server on localhost:8787:
- POST /concierge/send {text, cwd} — talk to Nyx / dispatch.
- POST /plan {goal, cwd, role?, verify?, retry_on_api_error?, mcps?} — start a plan; returns {id}. Plans queue FIFO (1 concurrent by default).
- GET /plan/:id — status + subtask counts. POST /plan/:id/cancel.
- GET/POST /sleep, /sleep/on, /sleep/off — sleep mode.
- GET /health.

Remote use: Claude Code's built-in Remote Control (/remote-control or /rc) runs locally and can reach localhost:8787, so the operator drives Nyx from another device with no tunnel.

## Repo / file system layout

- apps/moderator/ — the orchestrator (Node + TS, tsx):
  - src/index.ts — boot, daemons, the onWorkerExit hook.
  - src/planner/orchestrator.ts — plan execution (fanout, synth, verify, retry).
  - src/planner/planner.ts — the planner prompt.
  - src/planner/moderator.ts — the moderator supervisor layer (Phase 6).
  - src/planner/verify.ts — the end-to-end verify engine.
  - src/queue/verify-gate.ts — per-queue-item verify gate (Phase 5).
  - src/queue/code-gate.ts — build + test gate (the keystone).
  - src/queue/autopilot.ts, outcome.ts — the queue + outcome recording.
  - src/worker-manager.ts — spawns Claude workers, builds the prompt prefix, sessionIdleMs.
  - src/model-routing.ts — model + thinking-tier per role.
  - src/mcp/ — MCP catalog (installer.ts), the eyes MCP (eyes-mcp.ts), the nyx MCP (nyx-mcp.ts).
  - src/sleep-mode.ts, sleep-schedule.ts, watchdog.ts, self-iterator.ts.
  - src/context-contract.ts — structured Brief + live context store (Phase 7 seed).
- apps/desktop/ — Tauri (Rust + Solid) desktop app: Deep Research tab, vault (src-tauri/src/vault.rs).
- apps/dashboard/ — web dashboard.
- apps/tui-go/ — Go TUI (Bubble Tea / Lipgloss).
- packages/shared/.

## Model routing (tiered by role)

- Planner / plan-on-crack: Fable 5 (strongest; ~2x Opus cost; high/max thinking).
- Self-iter, visual-worker, architect: Opus 4.8.
- Feature, critic, watchdog, research: Sonnet 4.6.
- Concierge: Haiku 4.5 / Sonnet 4.6.
- Principle: strongest model for planning + supervision (judgment), cheap models for scoped execution.

## The eyes MCP (visual testing)

Workers opt into MCPs per subtask (mcps: ["eyes"]). eyes is a purpose-built, headless visual-testing MCP (Playwright under the hood) exposing high-level tools:
- eyes_screenshot, eyes_responsive (phone/tablet/desktop in one call), eyes_flow (click-throughs), eyes_audit (every button/link), eyes_layout (mobile overflow / tiny tap targets).
It returns real screenshots the model can see and settles deterministically (networkidle + stability loop) instead of flaking on sleeps. It replaced the stock @playwright/mcp server, which was flaky; the planner no longer recommends playwright.

## The gates (verification)

- Code gate (code-gate.ts): before an item is "done", run the repo's build (typecheck) + test scripts in the worker's cwd; re-pend if red. First line of defense, catches compile errors a browser cannot. Off switch: NYX_CODE_GATE_DISABLED.
- Verify gate (verify-gate.ts): runs the eyes engine on the deliverable; pass closes the item, fail re-pends for an informed retry. Off switch: NYX_VERIFY_DISABLED.
Both default on. The code gate runs first (cheapest signal).

## Autonomy + safety model

- Full autonomy, no escalation: Nyx never blocks waiting on the operator. On a decision point it makes the best reasonable call and logs it (non-blocking). The safety net is async: the gates catch wrong technical decisions (re-pend + retry), and a decision log lets the operator review/override after the fact. No stop-and-ask.
- Sleep mode: an auto-schedule (default 23:00-07:00) pauses the autopilot overnight. Disable with the file ~/.nyx/sleep-auto-disabled or NYX_SLEEP_AUTO_DISABLED=1.
- Self-iterator: on a worker failure it can diagnose and patch Nyx's own code, gated by a self-update lock. Safe mode (recommended on other people's machines): NYX_SELF_UPDATE=off.
- Watchdog: flags stalled (idle) and livelocked (repeating tool results) sessions.
- Cost guardrails: daily caps, inert in subscription mode (tracking only).

## Slash commands (TUI)

/plan [brief], /plan-on-crack (/poc), /fix [problem], /dispatch <queue-id>, /status, /sleep, /help.

## Branch topology

- Active dev line: nyx-feat/desktop-app-wave-2 (all features).
- main is a separate, older line.
- Verified reliability fixes: branch overhaul/p5-p6-rebuild (off the Phase 4 commit).
- Dozens of nyx-feat/* feature branches with an auto-merge + janitor system.

## Current overhaul status (the reliability rebuild)

A multi-phase overhaul to make Nyx reliable enough to safely build itself. On branch overhaul/p5-p6-rebuild, verified (typecheck + tests green) and committed:
- P1 Deep Research open-report fix: done.
- P2 Efficiency (prompt caching, kill per-turn auto-memory worker, right-size planner off Fable default, skip synth on single-subtask plans, fan-out cache stagger): done.
- P3 Automatic concierge triage (answer / 1 worker / plan / plan-on-crack): done.
- P4 The eyes visual-testing MCP: done.
- P5 Verify gate (mandatory, per queue item): done + wired.
- P6 Moderator layer: module done + tested, NOT yet wired into the orchestrator.
- P7 Context contract: Brief + live-store module seeded; wiring undone.
- P10 Keychain fix (0600 ~/.nyx/vault.json, no macOS prompts): done.
- Code gate (build + test before done): done + wired. The keystone.
- Planner stopped recommending the flaky playwright MCP; uses eyes: done.

Root-cause lesson: Nyx committed code that did not compile (a worker imported a function it never created) because nothing ran tsc. The code gate fixes that class of bug. Process problems found: sprawling multi-concern commits, no per-plan isolation (a failed plan left a dirty tree the next plan built on), leaky cancel/cleanup.

## What we want to add next (roadmap)

1. Per-plan worktree isolation: each plan on a clean base, merge only when green. Stops contamination.
2. Wire P6 (moderator) into the orchestrator fanout.
3. Finish P7: structured briefs into the worker prefix, replace the boot-frozen memory snapshot, lift the 8000-char project-memory cap.
4. Company mode: parallel moderators, role specialization (builder / marketer / support / analyst), an objective decomposer (one goal to sub-objectives), a metrics + feedback loop, goal-driven autonomy.
5. Branch hygiene: keep the auto-merge / janitor ahead of branch sprawl.
6. (Gated) always-on, headless, multi-machine operation: blocked because the Claude Code subscription auth needs interactive login (no headless path yet).

## Key constraints

- Runs under a Claude Code SUBSCRIPTION, not an API key. There is no headless/cloud path yet (auth needs interactive login), so a fully cloud-hosted Nyx is deferred.
- All work happens on the operator's machine; it must stay awake (caffeinate) for unattended overnight runs.

## How a request flows (end to end)

operator message to concierge, concierge classifies effort, for a plan it goes to the planner, planner emits subtasks, autopilot/orchestrator fans out workers, each worker edits the tree, the code gate runs build+test, the verify gate runs eyes for UI, the synthesizer (or single-subtask fast path) commits, the plan settles complete or re-pends for an informed retry. A moderator (once wired) watches each worker in flight and resolves drift autonomously.