Karpathy's CLAUDE.md Rules Are a Starting Point, Not a Rulebook
A recent post connected Karpathy's Claude Code advice to a small CLAUDE.md template from Forrest Chang. The useful part is not the template itself. It is the failure pattern underneath it.
The four rules are useful because they hit very familiar failure modes:
- Think before coding.
- Prefer simple code.
- Make focused changes.
- Define success before running off to implement.
I would keep those four. I just would not worship them.
But I do not think the real lesson is "copy these four rules" or "add eight more." The deeper lesson is that a good CLAUDE.md should behave more like a bug report than a manifesto.
Every rule should exist because the agent has made that mistake before. One scar, one rule.
Why CLAUDE.md Matters
CLAUDE.md is memory, not law. Claude Code loads it as context, but the model still has to choose well.
That matters. A CLAUDE.md file does not turn a coding agent into a deterministic system. It changes the odds. It makes the right behavior easier for the model to find.
Short, specific rules beat long preference dumps. "Run npm test before saying the change is complete" is better than "be careful with tests." One can be checked. The other is mood lighting.
Karpathy's four rules work because they are concrete. They do not ask the model to "act senior." They describe behaviors: surface assumptions, avoid speculative abstractions, do not touch unrelated code, and define success criteria.
The Useful Mental Model
I would think about CLAUDE.md like this:

| Agent failure | Better instruction | Proof it worked |
|---|---|---|
| The agent guesses instead of asking | State assumptions before coding | The final answer names the assumption or asks for missing context |
| The agent edits too much | Tie every changed line to the request | The diff contains no unrelated cleanup |
| The agent says "done" too early | Show verification before claiming success | The final answer includes the test, command, or limitation |
| The agent repeats an old mistake | Add one narrow rule for that failure | The same mistake stops showing up in later sessions |
That table is the reason I like the "bug report" framing. It keeps the file grounded in behavior you can observe.
The Part I Would Be Careful With
The source post makes a strong benchmark-style claim about mistake rates dropping after adding more rules. I would treat numbers like that as interesting, but not universal.
Agent performance depends on the codebase, task type, model version, tools, test suite, and how strict the reviewer is. A rule set that looks amazing on feature work may do very little for debugging infrastructure. A rule that saves a monorepo might slow down a small prototype.
There is also early research suggesting that agent rules can help, but not always in the way we expect. One April 2026 paper found that rules improved coding-agent performance, while also finding that negative constraints were the clearest individually useful rule type. That makes me even more cautious about treating any one template as magic. The safest rules are usually guardrails: do not guess silently, do not claim done without verification, do not edit unrelated code.
So I would not copy a 12-rule template and call it done. I would start with Karpathy's baseline, then add rules only when they map to real mistakes in my own workflow.
The useful question is not "how many rules should be in CLAUDE.md?" It is "what mistakes am I tired of correcting twice?"
Four Rules I Would Add
Read Before Writing
Focused changes are good, but they can be misunderstood. "Do not touch unrelated code" does not mean "ignore surrounding code." Before adding a function, the agent should read the exports, the caller, and any nearby utility that looks relevant. A lot of bad AI code is not wildly wrong. It is redundant. It recreates something that already existed 40 lines away.
Failure: agent adds a second version of logic that already exists.
Rule: read exports, callers, and nearby utilities before adding code.
Check: name the existing pattern before changing the file.
Keep Deterministic Decisions Out Of The Model
Use the model for judgment-heavy work: summarizing, classifying messy text, drafting, explaining, extracting from loose input. Do not use it for retry logic, status-code routing, pure transformations, or anything a normal function can decide. If a 503 status code already tells you the retry policy, code should handle that. The model should not be invited into every branch of the system just because it can produce an answer.
Failure: agent makes a probabilistic call where code should decide.
Rule: do not use the model for routing, retries, status handling, or pure transforms.
Check: if a deterministic input answers the question, code owns the branch.
Do Not Claim Done Without Proof
AI agents are very good at producing tests that pass. That is not the same as producing tests that matter. A useful test should fail when the actual business rule is broken. If the test only proves that a function returns something, it may be giving the model a false sense of completion.
The same goes for finished work. "Done" should come with the command that ran, the test result, the screenshot, or the reason verification was not possible.
Failure: agent says done because the code looks plausible.
Rule: do not claim completion without verification evidence.
Check: name the command, test, screenshot, or limitation in the final answer.
Fail Loudly
This is the one I care about most. The expensive failures are not the obvious ones. They are the failures that look like success. A migration that skipped records. A test suite that passed with skipped tests. A feature that works for the happy path but never checked the edge case you asked about.
When the agent is not sure, it should say that. When it skipped something, it should name it. When it could not verify the result, it should not imply that the work is done.
Failure: agent hides uncertainty behind a confident summary.
Rule: do not bury skipped work, failed checks, or unverified edge cases.
Check: the final answer says exactly what was not verified.
The Maintenance Rule
One meta-rule matters most: do not let CLAUDE.md become a junk drawer.
If a rule does not prevent a mistake, remove it. If a formatter, type checker, test, hook, or CI job can enforce it, move the rule down into tooling instead of spending model attention on it. If a rule only applies to one part of the codebase, move it closer to that part of the codebase. Claude Code supports project rules under .claude/rules/, including path-scoped rules, which is a cleaner fit for large projects than forcing every instruction into one file.
This is where old agent rules start to rot: stale test commands, folder rules for code that moved, preferences nobody remembers writing. A small repo can get away with one short CLAUDE.md. A larger system needs layers: project-level instructions, path-specific rules, and sometimes skills for repeatable workflows.
The goal is not to make the instruction file impressive. The goal is to make the next agent session less annoying.
The Practical Takeaway
Karpathy's rules are valuable because they describe the failures almost everyone sees when they first use AI coding agents seriously. The model guesses. It overbuilds. It edits too much. It loses track of what success means.
That makes the four-rule baseline worth keeping.
But the best CLAUDE.md file is not the longest one or the most viral one. It is the one that reflects your actual scars from working with agents in your own code.
Start with the baseline. Watch the mistakes. Add a rule only when the same failure shows up again.
That is how CLAUDE.md becomes engineering documentation instead of prompt cargo culting.
Sources: the post that sparked this, Forrest Chang's Karpathy-inspired CLAUDE.md, Claude Code's memory documentation, and research on agent rules.
Tagged



