Releases: obra/superpowers
v6.0.2
v6.0.0
v6.0.0 (2026-06-16)
Superpowers 6.0 is a big release. The headline is a rewrite of how subagent-driven-development reviews each task — cheaper, stricter, and harder to game.
While these numbers won't hold on every harness and for every workload, in our evals, Claude Code and Codex produce similar high-quality results roughly twice as fast and while spending almost 50% fewer tokens.
It also adds three new harnesses (Kimi Code, Pi, and Antigravity), gives the brainstorming visual companion a better security model, and rewrites a number of skills' tool calls to be significantly more vendor-neutral.
Visible Changes
- The two per-task reviewer prompts became one.
spec-reviewer-prompt.mdandcode-quality-reviewer-prompt.mdare gone, replaced by a singletask-reviewer-prompt.md. If you dispatch the old files directly, switch to the new one. - The legacy global worktree directory is gone.
using-git-worktreesandfinishing-a-development-branchno longer use~/.config/superpowers/worktrees/. Worktrees now land in the project — an existing.worktrees/orworktrees/if you have one, otherwise a fresh.worktrees/— unless you say otherwise.
New Harness Support
Superpowers now runs on three more harnesses. Each ships its own bootstrap, a tool-mapping reference, and tests, and each gets its own install section in the README.
- Kimi Code — a plugin manifest, install docs, and manifest tests; install from Kimi's marketplace or straight from the repo. (initial manifest by @qer)
- Pi — a session-start extension that registers the skills and injects the
using-superpowersbootstrap. Pi has native skills, so it needs no compatibility shim. - Antigravity (
agy) — installs the plugin directly and bootstraps from the first message; verified end-to-end against the standard "make a react todo list" acceptance test.
Subagent-Driven Development
A long run of cost-and-quality experiments on real projects reshaped how the controller reviews each task. The old flow ran two reviewers per task and leaned on the controller's judgment for model choice and severity, and both turned out to be expensive and easy to game. The new flow runs one reviewer per task, hands work off as files instead of pasted text, and takes several judgment calls away from the controller.
- One reviewer per task, two verdicts. A single
task-reviewer-prompt.mdreads the task's diff once and returns both a spec-compliance verdict and a quality verdict, so one fix pass clears both. A new "can't verify from the diff" verdict flags requirements that live in untouched code, for the controller to check itself. (#1538, #1543) - One broad review at the end. The run finishes with a single whole-branch review on the most capable model, instead of re-reviewing everything task by task.
- Plans get a pre-flight read. Before the first task, the controller checks the plan for internal conflicts — and for anything the plan asks for that a reviewer would flag as a defect — and raises it all at once, rather than stumbling into it mid-run.
- Diffs and task text move as files. A pasted diff parks itself permanently in the most expensive context, and a reviewer without one rebuilds it by hand — the single biggest reviewer cost. Two new scripts,
task-briefandreview-package, write the task text and the review diff to files for the subagent to read. - Every dispatch states its model. Left to choose, controllers stopped naming a model at all — and an unnamed model quietly inherits the session's most expensive one, so one run put all 26 of its reviewers on the top tier. The templates now require a model, with guidance that reaches for cheaper tiers when the work allows.
- The controller can't tell a reviewer what to ignore. Real runs caught controllers coaching reviewers to skip a finding or call it "Minor at most," and the flaw shipped. Suppressing findings and pre-rating severity are now banned outright, and a defect the plan itself mandates gets reported for you to decide on rather than waved through.
- Reviewers are read-only and skeptical of rationales. Review no longer touches the working tree or branch — a reviewer running
git checkouthad been orphaning later commits — and an implementer's "I left this unabstracted on purpose" no longer talks a reviewer out of a real finding. - Stronger evidence and reporting. Reviewers back each answer with a file and line, the implementer's report moves to a file and carries red/green evidence when TDD applies, and a progress ledger lets a controller that loses its context resume instead of redoing finished work. (#994)
Writing Plans
Plans now carry the structure the controller and reviewers used to re-derive on every dispatch.
- A Global Constraints block lists the rules that bind every task — version floors, dependency limits, naming and copy, exact values — copied in verbatim, so they actually reach the implementers and reviewers downstream.
- A per-task Interfaces block names exactly what each task consumes and produces, so an implementer who sees only its own task still knows its neighbors' contracts.
- Right-sizing guidance keeps a task at the size that earns its own test cycle and a reviewer's pass, folding setup, config, and docs into the task that needs them. In testing, a plan written this way needed one round of fixes where the control needed two to four — and the control shipped a real bug.
Brainstorming Visual Companion
The visual companion is a small web server the agent opens alongside the conversation. It had no authentication at all, so on a shared or remote machine anyone who could reach the port could read your brainstorm — or inject events the agent treats as your input. This release gives it a real security model and makes it survive restarts and dropped connections.
- A per-session key now guards everything. The agent's URL carries a one-time key, the browser tucks it into a tab-scoped cookie, and every request and WebSocket connection has to present it. This closes the door to stray local tabs and routable remote hosts alike, including the DNS-rebinding case an origin allowlist can't catch. (Closes #1014)
- The file server stays in its sandbox. It refuses symlinks, dotfiles, and any path that climbs out of the content directory, ignores macOS resource-fork files, and sends the usual no-store and deny-framing headers. Files that hold the session key are written owner-only.
- The companion is offered only when it helps. The skill raises it the first time a question would read better shown than told, as its own message, and lets a decline stand. Accepting opens your browser to the first screen. (Closes #755)
- It survives restarts and flaky connections. Given a project directory, the server keeps the same port and key across restarts, so an open tab simply reconnects. The page reconnects on its own, shows a live status pill, and raises a "paused" overlay while the server is down.
- Longer idle life, safer shutdown. The idle timeout went from 30 minutes to 4 hours, and
stop-server.shnow confirms it owns the right process before signaling, so it never kills an unrelatednodeafter a reboot. (#1703) - Windows launch hardening — consolidated shell detection, and Windows now relies on the idle timeout for shutdown, since Node can't track POSIX process ownership across MSYS2.
Existing Harness Updates
- Codex now bootstraps through its own SessionStart hook rather than shared wiring, and the Codex App gained an install section and fuller tool docs (web search,
AGENTS.md, personal skills). (#1540) - OpenCode got an action-based tool mapping across its plugin, install doc, and README, plus a bootstrap-caching test.
- Cursor's manifest dropped its
agentsandcommandsentries, since those directories no longer exist.
One Set of Skills, Every Harness
The skills used to speak Claude Code's dialect — "use the Task tool," "put it in CLAUDE.md." This release rewrites that vocabulary in terms of what you're actually doing ("dispatch a subagent," "your instructions file") and adds a per-harness reference that maps each action to the right tool, checked against each runtime. Prose that named "Claude" now says "your agent."
- A tool reference per harness at
skills/using-superpowers/references/, covering Claude Code, Codex, Copilot, Gemini, Pi, and Antigravity. finishing-a-development-branchwent forge-neutral — it no longer hardcodesgh pr create, so agents push with whatever forge tooling they have. (#1609)- One rename: "Claude Search Optimization" is now "Skill Discovery Optimization," since the technique isn't Claude-specific.
Writing Skills
Two additions for skill authors.
- Match the Form to the Failure — a short table for picking the right kind of guidance. A flat "don't do X" works for discipline slips but backfires when the problem is the shape of an output, where a worked example does better. The table, and a tighter scope on the existing rationalization section, steer authors to the form that actually helps.
- Micro-Test Wording — a cheap way to check a phrasing before committing to it: sample it a handful of times against a no-guidance control and read every result by hand, treating run-to-run variance as a warning sign.
Testing
Skill-behavior testing moved out of tests/ into a new evals/ submodule built on "drill," which runs real Claude Code, Codex, and Gemini sessions and judges them with an LLM. Several in-tree bash suites retired once a stricter drill scenario covered them; the few with no equivalent stayed. From here on, tests/ holds plugin-code tests and evals/ holds skill-behavior tests, and docs/testing.md explains the split. New backends reach Antigravity, Pi, and more models, and new shell-lint and pre-commit checks guard the harness. (#1541)
Bug Fixes
- **systematic-debuggi...
v5.1.0
Removals
- Legacy slash commands removed —
/brainstorm,/execute-plan, and/write-planare gone. They were deprecated stubs that did nothing but tell the user to invoke the corresponding skill. Invokesuperpowers:brainstorming,superpowers:executing-plans, andsuperpowers:writing-plansdirectly instead. (#1188) superpowers:code-reviewernamed agent removed — the agent was the plugin's only named agent and was used by exactly two skills, while every other reviewer/implementer subagent in the repo dispatchesgeneral-purposewith a prompt template alongside its skill. The agent's persona and checklist have been merged intoskills/requesting-code-review/code-reviewer.mdas a self-contained Task-dispatch template. Anyone dispatchingTask (superpowers:code-reviewer)should switch toTask (general-purpose)with the prompt template instead. (PR #1299)- Integration sections removed from skills — these were a legacy of the time before agents had native skills systems and didn't help with steering.
Worktree Skills Rewrite
using-git-worktrees and finishing-a-development-branch now detect when the agent is already running inside an isolated worktree and prefer the harness's native worktree controls before falling back to git worktree. Behavior was TDD-validated and cross-platform-checked across five harnesses. (PRI-974, PR #1121)
- Environment detection — both skills check
GIT_DIR != GIT_COMMONbefore doing anything; if already in a linked worktree, creation is skipped entirely. A submodule guard prevents false detection. - Consent before creating worktrees —
using-git-worktreesno longer creates worktrees implicitly; the skill asks the user first. Fixes #991 (subagent-driven-development was auto-creating worktrees without consent). - Native tool preference (Step 1a) — when the harness exposes its own worktree tool (e.g. Codex), the skill defers to it. The user's stated preference is respected when expressed.
- Provenance-based cleanup —
finishing-a-development-branchonly cleans up worktrees inside.worktrees/(created by superpowers); anything outside is left alone. Fixes #940 (Option 2 was incorrectly cleaning up worktrees), #999 (merge-then-remove ordering), and #238 (cdto repo root beforegit worktree remove). - Detached HEAD handling — the finishing menu collapses to two options when there is no branch to merge from.
- Hardcoded
/Users/jessepaths in skill examples replaced with generic placeholders. (#858, PR #1122)
Contributor Guidelines for AI Agents
Two new sections at the top of CLAUDE.md (symlinked to AGENTS.md) speak directly to AI agents. An audit of the last 100 closed PRs against this repo showed a 94% rejection rate driven by AI-generated slop: agents that didn't read the PR template, opened duplicates, fabricated problem descriptions, or pushed fork- or domain-specific changes upstream.
- Pre-submission checklist — read the PR template, search for existing PRs, verify a real problem exists, confirm the change belongs in core, and show the human partner the complete diff before submitting.
- What we will not accept — third-party dependencies, "compliance" rewrites of skill content, project-specific configuration, bulk PRs, speculative fixes, domain-specific skills, fork-specific changes, fabricated content, and bundled unrelated changes.
- New harness PRs require a session transcript — most past new-harness integrations copied skill files or wrapped with
npx skillsinstead of loading theusing-superpowersbootstrap at session start. The acceptance test ("Let's make a react todo list" must auto-triggerbrainstormingin a clean session) and a complete transcript are now required.
Codex Plugin Mirror Tooling
New sync-to-codex-plugin script mirrors superpowers into the OpenAI Codex plugin marketplace as prime-radiant-inc/openai-codex-plugins. Path/user-agnostic so any team member can run it. (PR #1165)
- Clones the fork fresh into a temp directory per run, regenerates overlays inline, and opens a PR; auto-detects upstream from the script's own location and preflights
rsync/git/gh auth/python3. --bootstrapflag for first-time setup;EXCLUDESpatterns anchored to source root;assets/excluded.- Mirrors
CODE_OF_CONDUCT.md; drops theagents/openai.yamloverlay. - Seeds
interface.defaultPromptin the mirroredplugin.json. (PR #1180 by @arittr) - Codex plugin files are committed to the source repo so the sync script uses canonical versions; Codex marketplace metadata is preserved.
OpenCode
- Bootstrap content cached at module level —
getBootstrapContent()was callingfs.existsSync+fs.readFileSync+ frontmatter regex on every agent step (theexperimental.chat.messages.transformhook fires on every step in OpenCode's agent loop). Now read once, cached for the session lifetime, with a null sentinel for the missing-file case. 15 regression tests cover cache behavior, fs call counts, the injection guard, the missing-file sentinel, and cache reset. (Fixes #1202) - Integration tests modernized.
- Install caveats clarified in the README.
Code Review Consolidation
requesting-code-review is now self-contained: the persona, checklist, and dispatch template live in skills/requesting-code-review/code-reviewer.md and the skill dispatches Task (general-purpose) directly. (PR #1299)
- Single source of truth — the persona/checklist that previously lived in both
agents/code-reviewer.mdand the skill's placeholder template (and drifted independently) is now one file. subagent-driven-developmentfollows suit — itscode-quality-reviewer-prompt.mdnow dispatchesTask (general-purpose)instead of the named agent.- Behavioral test added —
tests/claude-code/test-requesting-code-review.shplants real bugs (SQL injection, plaintext password handling, credential logging) into a tiny project and asserts the dispatched reviewer flags every planted issue at Critical/Important severity and refuses to approve the diff. - Codex and Copilot workaround docs trimmed — the "Named agent dispatch" sections in
references/codex-tools.mdandreferences/copilot-tools.mddocumented how to flatten a named agent into a generic dispatch. With no named agents shipping, the workaround is unnecessary; both sections were dropped.
Subagent-Driven Development
- No more pause every 3 tasks — the "review after each batch (3 tasks)" cadence in
requesting-code-review(originally forexecuting-plans) was leaking intosubagent-driven-development. Replaced with "each task or at natural checkpoints" plus an explicit continuous-execution directive. - SDD integration test now runs its assertions — three independent bugs caused the test to silently bail before printing any verification results: an unresolved
..segment in the working-dir path, aset -euo pipefailinteraction withfind | sort | head -1(SIGPIPE on the producer killed the script), and a missing--plugin-diron theclaude -pinvocation that caused the test to load the installed plugin instead of the working tree. All three fixed; six verification tests now actually run against a real end-to-end SDD run.
Cursor
- Windows SessionStart hook routed through
run-hook.cmdinstead of invoking the extensionlesssession-startscript directly. Fixes Windows opening the file in an editor instead of running it. Also removed an accidental UTF-8 BOM fromhooks-cursor.json.
Gemini CLI
- Subagent dispatch mapping — Gemini's
Taskdispatch now maps to@agent-name/@generalist, with parallel subagent dispatch documented for independent tasks.
Skills
- Terminology cleanups across skill content.
Documentation & Install
- Factory Droid installation instructions added to README.
- Quickstart install links in README. (PR #1293 by @arittr)
- Codex plugin install guidance updated. (PR #1288 by @arittr)
- Codex
waitmapping corrected towait_agentin the tools reference. - Install order reorganized; Codex install instructions cleaned up.
- Removed vestigial
CHANGELOG.mdin favor ofRELEASE-NOTES.mdas the single source. (PR #1163 by @shaanmajid) - Discord invite link fixed; release announcements link and a detailed Discord description added to the Community section.
Community
v5.0.7
GitHub Copilot CLI Support
- SessionStart context injection — Copilot CLI v1.0.11 added support for
additionalContextin sessionStart hook output. The session-start hook now detects theCOPILOT_CLIenvironment variable and emits the SDK-standard{ "additionalContext": "..." }format, giving Copilot CLI users the full superpowers bootstrap at session start. - Tool mapping — added
references/copilot-tools.mdwith the full Claude Code to Copilot CLI tool equivalence table - Skill and README updates — added Copilot CLI to the
using-superpowersskill's platform instructions and README installation section
OpenCode Fixes
- Skills path consistency — the bootstrap text no longer advertises a misleading
configDir/skills/superpowers/path that didn't match the runtime path. The agent should use the nativeskilltool, not navigate to files by path. Tests now use consistent paths derived from a single source of truth. (#847, #916) - Bootstrap as user message — moved bootstrap injection from
experimental.chat.system.transformtoexperimental.chat.messages.transform, prepending to the first user message instead of adding a system message. Avoids token bloat from system messages repeated every turn (#750) and fixes compatibility with Qwen and other models that break on multiple system messages (#894).
v5.0.6
Inline Self-Review Replaces Subagent Review Loops
The subagent review loop (dispatching a fresh agent to review plans/specs) doubled execution time (~25 min overhead) without measurably improving plan quality. Regression testing across 5 versions with 5 trials each showed identical quality scores regardless of whether the review loop ran.
- brainstorming — replaced Spec Review Loop (subagent dispatch + 3-iteration cap) with inline Spec Self-Review checklist: placeholder scan, internal consistency, scope check, ambiguity check
- writing-plans — replaced Plan Review Loop (subagent dispatch + 3-iteration cap) with inline Self-Review checklist: spec coverage, placeholder scan, type consistency
- writing-plans — added explicit "No Placeholders" section defining plan failures (TBD, vague descriptions, undefined references, "similar to Task N")
- Self-review catches 3-5 real bugs per run in ~30s instead of ~25 min, with comparable defect rates to the subagent approach
Brainstorm Server
- Session directory restructured — the brainstorm server session directory now contains two peer subdirectories:
content/(HTML files served to the browser) andstate/(events, server-info, pid, log). Previously, server state and user interaction data were stored alongside served content, making them accessible over HTTP. Thescreen_dirandstate_dirpaths are both included in the server-started JSON. (Reported by 吉田仁)
Bug Fixes
- Owner-PID lifecycle fixes — the brainstorm server's owner-PID monitoring had two bugs causing false shutdowns within 60 seconds: (1) EPERM from cross-user PIDs (Tailscale SSH, etc.) was treated as "process dead", and (2) on WSL the grandparent PID resolves to a short-lived subprocess that exits before the first lifecycle check. Fixed by treating EPERM as "alive" and validating the owner PID at startup — if it's already dead, monitoring is disabled and the server relies on the 30-minute idle timeout. This also removes the Windows/MSYS2-specific carve-out from
start-server.shsince the server now handles it generically. (#879) - writing-skills — corrected false claim that SKILL.md frontmatter supports "only two fields"; now says "two required fields" and links to the agentskills.io specification for all supported fields (PR #882 by @arittr)
Codex App Compatibility
- codex-tools — added named agent dispatch mapping documenting how to translate Claude Code's named agent types to Codex's
spawn_agentwith worker roles (PR #647 by @arittr) - codex-tools — added environment detection and Codex App finishing sections for worktree-aware skills (by @arittr)
- Design spec — added Codex App compatibility design spec (PRI-823) covering read-only environment detection, worktree-safe skill behavior, and sandbox fallback patterns (by @arittr)
v5.0.5
Bug Fixes
- Brainstorm server ESM fix — renamed
server.js→server.cjsso the brainstorming server starts correctly on Node.js 22+ where the rootpackage.json"type": "module"causedrequire()to fail. (PR #784 by @sarbojitrana, fixes #774, #780, #783) - Brainstorm owner-PID on Windows — skip PID lifecycle monitoring on Windows/MSYS2 where the PID namespace is invisible to Node.js, preventing the server from self-terminating after 60 seconds. (#770, docs from PR #768 by @lucasyhzlu-debug)
- stop-server.sh reliability — verify the server process actually died before reporting success. SIGTERM + 2s wait + SIGKILL fallback. (#723)
Changed
- Execution handoff — restore user choice between subagent-driven and inline execution after plan writing. Subagent-driven is recommended but no longer mandatory.
v5.0.4
Review Loop Refinements
Dramatically reduces token usage and speeds up spec and plan reviews by eliminating unnecessary review passes and tightening reviewer focus.
- Single whole-plan review — plan reviewer now reviews the complete plan in one pass instead of chunk-by-chunk. Removed all chunk-related concepts (
## Chunk N:headings, 1000-line chunk limits, per-chunk dispatch). - Raised the bar for blocking issues — both spec and plan reviewer prompts now include a "Calibration" section: only flag issues that would cause real problems during implementation. Minor wording, stylistic preferences, and formatting quibbles should not block approval.
- Reduced max review iterations — from 5 to 3 for both spec and plan review loops. If the reviewer is calibrated correctly, 3 rounds is plenty.
- Streamlined reviewer checklists — spec reviewer trimmed from 7 categories to 5; plan reviewer from 7 to 4. Removed formatting-focused checks (task syntax, chunk size) in favor of substance (buildability, spec alignment).
OpenCode
- One-line plugin install — OpenCode plugin now auto-registers the skills directory via a
confighook. No symlinks orskills.pathsconfig needed. Install is just adding one line toopencode.json. (PR #753) - Added
package.jsonso OpenCode can install superpowers as an npm package from git.
Bug Fixes
- Verify server actually stopped —
stop-server.shnow confirms the process is dead before reporting success. SIGTERM + 2s wait + SIGKILL fallback. Reports failure if the process survives. (PR #751) - Generic agent language — brainstorm companion waiting page now says "the agent" instead of "Claude".