How to Stop AI From Being a Yes Man With an LLM Council
Revamp. May 21, 2026. Original post on May 18 was the theory. This version has the production data.
"You're absolutely right."
Shut up, Claude :)
I say that with love. But if you have used AI for more than 5 minutes, you know what I mean.
You can give it a half-baked product idea, two contradictory constraints, and just enough confidence to sound dangerous, and it still puts on the little customer success smile.
"Strong direction."
No. Sometimes the direction is a wall.
That is the annoying part about using LLMs for real decisions. Helpful often turns into agreeable. Agreeable is fine when you need a regex or a first draft. It is bad when you are choosing architecture, product direction, pricing, security, or agent behavior that gets baked into the repo.
At that point, you need a few voices willing to make the idea uncomfortable before it becomes expensive.
That is where an LLM Council is useful.
The first version of this post was theory. I had the pattern working in Codex but not in Claude Code, and the whole thing was unverified at scale. Three days later the skill shipped, 34 test assertions enforce its rules, and I ran five arcs in one day where the council ran internally on every reasoned fork. None of them bounced a decision back to me. The pattern works. Here is the version with the receipts.
My Spin On The LLM Council
Karpathy's LLM Council is the provider-based version. Send one prompt to several LLMs, anonymize the answers, rank them, then synthesize the result.
My spin is different. Instead of filling the council with providers, fill it with the personalities you wish the providers would reliably bring to the room.
The question I care about is less "which model is smartest?" and more:
"Will this thing tell me when my idea is bad?"
Most of the time, I do not want 5 versions of the same helpful answer. I want 5 jobs: Contrarian, First Principles Thinker, Expansionist, Outsider, and Executor.
You can run those roles on one model or several models. The important part is that each role gets the same brief and returns a focused critique.
One prompt that says "be skeptical, practical, strategic, simple, visionary, and concise" usually turns into soup. Keeping the roles separate is the difference between a critique and a paragraph that politely avoids the main problem.
The 5 Voices I Want In The Room
| Role | What I want from it |
|---|---|
| Contrarian | Failure modes, weak assumptions, abuse cases, misleading claims |
| First Principles Thinker | The actual problem, minimum coherent workflow, missing proof |
| Expansionist | The stronger version of the idea, upside, adjacent capability |
| Outsider | What a normal person would find confusing, vague, or overcomplicated |
| Executor | The next action, verification check, owner, and defer list |
The point is to take what people notice when they compare models, that each one has a different temperament, and make those temperaments explicit.
Do not hope the model happens to be skeptical. Assign skepticism.
Do not ask one assistant to agree, disagree, expand, reduce, plan, and judge all at once. That is how you get a smooth paragraph that somehow avoids the main problem.
The council should be short:
1. Write one decision brief.
2. Run the 5 roles against the brief.
3. Collect their findings.
4. Synthesize conflicts and agreements.
5. Choose one implementation move.
That is it. No tiny parliament living in your terminal. Just enough friction to stop AI from politely escorting a bad idea into production.
Mode 1 And Mode 2 (The Part I Got Wrong The First Time)
The original post made it sound like the council is always five parallel subagents. That is one valid mode. It is not the default, and it should not be.
In practice there are two modes, and treating them as the same thing is how this pattern degrades into theater.
Mode 1: single-model, structured pass. One model. The model runs the question through all 5 lenses inside one reasoning trace, weighs the pressure from each, picks the move that best survives the critique. This is the default. It is what you should reach for when you catch yourself agreeing too quickly, when a planner hits an ambiguous fork, or when you are about to bounce a decision back to the operator because you do not feel like making it.
Mode 2: spawn five parallel subagents. Five Task subagents, each with isolated context and a single persona prompt. A synthesis step merges them. This is for high-stakes decisions where context contamination across personas would actually matter. It costs five times as much. You should treat it as a rare tool, not the default.
The honest framing matters more than the mechanism. If you ran Mode 1 and reported it as "the council voted," you have lied about your process. One model running a template is not a vote. The output of Mode 1 should say something like "after running the lenses, the move that survived the critique is X." The output of Mode 2 can say "five subagents returned, here is the synthesis." Different mechanism, different language.
I shipped the skill with this distinction enforced as a hard ban. "The council voted" and "consensus says" are banned phrases. If you write them, the test fails. More on that below.
The Brief Does Most Of The Work
The brief matters more than the role names. If the brief is vague, 5 lenses just give you 5 flavors of fog. Very well formatted fog.
Use something like this:
Decision:
Context:
Constraints:
What success means:
What would make this decision wrong:
What should not be touched yet:
That last line is not decorative. AI agents love expanding scope. You ask whether a workflow should become a skill, and 4 minutes later it is proposing a platform and a new operating philosophy.
Relax. The council should make the next move clearer, not turn every decision into a migration plan with feelings.
What Actually Changed When I Used It (For Real, At Scale)
The original post said "I used this workflow while shaping this article." That was true. It was also a single data point.
Here is the production data three days later.
Five arcs in one day, zero bounces on reasoned forks. On May 20 I ran five back-to-back engineering arcs through a single dispatch directive: planner runs the council internally for every reasoned-engineering decision and locks the answer. Bounce to the operator only for five enumerated categories (new top-level concept, new spend, destructive operations, sudo outside the relaxed set, irrecoverable choices). Every other fork stays council-internal. The result: five arcs shipped clean, all at $0 spend, none of them surfacing a reasoned-engineering decision back to me. The same planners that used to ask "should I do X or Y" now ran the council, picked, documented the tradeoff inline, and continued.
The recursive recursion-safe moment. The arc that shipped the council skill itself used the council to make its own internal decisions. Planner hit a fork on em-dash handling inside persona prefixes. Council mode 1 ran internally, picked period as the separator, documented the tradeoff. The pattern was used to build the pattern. That is the loop closing.
The honest-framing catch. Closeout summary said "v1 shipped." Council ran on the summary itself, not the code. Contrarian flagged the framing as overconfident. Outsider pointed out the knowledge layer was not actually wired. Three different premature v1 claims got caught the same way inside twelve hours. None of those would have surfaced from a normal "review this" pass.
The council did not decide for me. It made the weak parts harder to ignore. The difference is that now it does this without my involvement on the easy ones, which means I see only the decisions I should actually be making.
Honest By Gate
Here is the part of the shipping artifact that matters most.
The skill is enforced by 34 test assertions. 22 are static lints over the skill files and the slash command definition. 12 are live-SMOKE assertions over a real council invocation against an OAuth-token-caching test question.
The lints check that the five persona descriptions match the canonical wording byte-for-byte. They check that the banned-phrase list contains "the council voted," "consensus says," and two other phrases that drift the framing toward fake-vote language. They check that Mode 2 is never auto-invoked. They check that the slash command exposes both modes and defaults to Mode 1.
The live-SMOKE assertions check that a real invocation produces the expected sections, that no banned phrase appears in the output, that the synthesis paragraph references the surviving move rather than a vote, and that the brief format is followed.
This is what I mean by honest by gate. The framing rules are not advice in the skill description. They are invariants that fail the next test run if they drift. Future me does not get to relax them in a quiet edit. The test runs and tells me I broke the contract.
If you build this pattern in your own setup, this is the part I would not skip. The pattern is only useful if the language stays honest. Banned-phrase enforcement is what keeps Mode 1 from drifting into Mode-2-cosplay.
Add It To Codex
Codex supports skills for reusable workflows and subagents for parallel specialized work. Mode 1 lives in the skill body. Mode 2 spawns the subagents.
Create a personal skill:
mkdir -p ~/.codex/skills/decision-council
$EDITOR ~/.codex/skills/decision-council/SKILL.md
Starting point:
---
name: decision-council
description: Use when an AI assistant is likely to agree too quickly with a product, architecture, UX, security, or implementation decision.
---
Default to Mode 1.
Mode 1: run the 5 lenses inside one reasoning pass.
1. Contrarian: find the flaw, weak assumption, abuse case, or misleading claim.
2. First Principles Thinker: restate the actual problem and minimum coherent workflow.
3. Expansionist: find the stronger version, upside, or adjacent capability.
4. Outsider: explain what is confusing to someone without project context.
5. Executor: define the next action, verification check, and defer list.
Return:
1. Per-lens position (1-3 sentences each).
2. Strongest disagreement.
3. Surviving move.
4. Deferred risks.
Frame as "the surviving move is X" or "after running the lenses, X holds up best."
Do NOT frame as "the council voted" or "consensus says." This is one model running a template, not a vote.
Mode 2: spawn 5 subagents in parallel, one per role, only when the user explicitly says "spawn the council" or "use agents." Synthesize their outputs into the same return format.
Then invoke it with a decision brief:
Use the decision-council skill.
Decision: Should we replace this hand-rolled workflow with a skill and 5 subagents?
Context: ...
Constraints: ...
What success means: ...
What would make this decision wrong: ...
What should not be touched yet: ...
This is the part people skip, then they blame the model. Do not type "review this idea." Give it the decision, constraints, and what would prove you wrong.
Add It To Claude Code
In Claude Code, define five subagents for Mode 2 and one skill or slash command that runs Mode 1 inline and routes to the subagents on the explicit trigger.
mkdir -p .claude/agents .claude/skills/decision-council
$EDITOR .claude/agents/contrarian.md
$EDITOR .claude/skills/decision-council/SKILL.md
Each subagent stays small:
---
name: contrarian
description: Reviews decisions for failure modes and weak assumptions.
tools: Read, Grep, Glob
model: inherit
---
Return the top 3 risks and the assumption most likely to be false.
Repeat for first-principles-thinker, expansionist, outsider, and executor.
The skill:
---
description: Run a 5-lens decision council. Mode 1 default, Mode 2 on explicit trigger.
disable-model-invocation: true
---
Decision brief:
$ARGUMENTS
Default: Mode 1. Run the five lenses inside one reasoning pass and return the surviving move.
If the brief says "spawn the council" or "use agents," switch to Mode 2 and dispatch these subagents in parallel:
- contrarian
- first-principles-thinker
- expansionist
- outsider
- executor
Return: per-lens position, strongest disagreement, surviving move, deferred risks.
Banned phrases: "the council voted," "consensus says," "the council agrees," "vote." One model running a template is not a vote. Frame as "the surviving move is X."
Now the command becomes:
/decision-council "Decision: ... Context: ... Constraints: ..."
For DNA-level auto-invocation, drop five triggers into your CLAUDE.md so the model reaches for Mode 1 without being asked:
- Before any AskUserQuestion call where the question is not deeply personal.
- Inside the planner phase when a fork has an ambiguous tiebreaker.
- During code review when architectural tradeoffs exist.
- As an agree-too-quickly self-check when the answer feels suspiciously easy.
- As a pre-mortem before dispatching a kickoff brief.
Mode 2 stays off the auto-invocation list. It only fires on the explicit trigger phrase.
When Not To Use It
Do not run a council for tiny choices. Naming a variable does not need 5 lenses and a committee chair. Please just name the variable and move on with your life.
Do not use it when the real bottleneck is missing data. If nobody knows the customer requirement, 5 lens prompts will only produce cleaner speculation. It is still guessing in a nicer outfit.
Use it when the cost of being wrong is real: architecture, security, product direction, pricing, data model changes, public claims, and agent behavior other people will depend on.
The goal is a better next move.
The Receipt
Three days from theory post to enforced shipping artifact. Five arcs in one day with zero bounces on reasoned forks. 34 test assertions keeping the framing honest. One real catch where the council was run against its own closeout summary and found the v1 claim false.
That is the whole pitch. The pattern works. The language stays honest because the tests will not let it drift. The model stops being a yes man because you have given it a job that is harder to fake.
Now go assign some skepticism.
If you want the sister post on how this stacks with a Planner / Executor / Verifier triad on the build side, see The Council Picks The Move. The Triad Ships It.
Tagged



