Artificial Intelligence

Karpathy's CLAUDE.md Rules Are a Starting Point, Not a Rulebook

April 26, 2026Jadan Jones6 min read

Karpathy's CLAUDE.md Rules Are a Starting Point, Not a Rulebook

A recent post connected Karpathy's Claude Code advice to a small CLAUDE.md template from Forrest Chang. The useful part is not the template itself. It is the failure pattern underneath it.

The four rules are useful because they hit very familiar failure modes:

Think before coding.
Prefer simple code.
Make focused changes.
Define success before running off to implement.

I would keep those four. I just would not worship them.

But I do not think the real lesson is "copy these four rules" or "add eight more." The deeper lesson is that a good CLAUDE.md should behave more like a bug report than a manifesto.

Every rule should exist because the agent has made that mistake before. One scar, one rule.

Why CLAUDE.md Matters

CLAUDE.md is memory, not law. Claude Code loads it as context, but the model still has to choose well.

That matters. A CLAUDE.md file does not turn a coding agent into a deterministic system. It changes the odds. It makes the right behavior easier for the model to find.

Short, specific rules beat long preference dumps. "Run npm test before saying the change is complete" is better than "be careful with tests." One can be checked. The other is mood lighting.

Karpathy's four rules work because they are concrete. They do not ask the model to "act senior." They describe behaviors: surface assumptions, avoid speculative abstractions, do not touch unrelated code, and define success criteria.

The Useful Mental Model

I would think about CLAUDE.md like this:

A visual loop for turning repeated Claude Code mistakes into narrow CLAUDE.md rules

Agent failure	Better instruction	Proof it worked
The agent guesses instead of asking	State assumptions before coding	The final answer names the assumption or asks for missing context
The agent edits too much	Tie every changed line to the request	The diff contains no unrelated cleanup
The agent says "done" too early	Show verification before claiming success	The final answer includes the test, command, or limitation
The agent repeats an old mistake	Add one narrow rule for that failure	The same mistake stops showing up in later sessions

That table is the reason I like the "bug report" framing. It keeps the file grounded in behavior you can observe.

The Part I Would Be Careful With

The source post makes a strong benchmark-style claim about mistake rates dropping after adding more rules. I would treat numbers like that as interesting, but not universal.

Agent performance depends on the codebase, task type, model version, tools, test suite, and how strict the reviewer is. A rule set that looks amazing on feature work may do very little for debugging infrastructure. A rule that saves a monorepo might slow down a small prototype.

There is also early research suggesting that agent rules can help, but not always in the way we expect. One April 2026 paper found that rules improved coding-agent performance, while also finding that negative constraints were the clearest individually useful rule type. That makes me even more cautious about treating any one template as magic. The safest rules are usually guardrails: do not guess silently, do not claim done without verification, do not edit unrelated code.

So I would not copy a 12-rule template and call it done. I would start with Karpathy's baseline, then add rules only when they map to real mistakes in my own workflow.

The useful question is not "how many rules should be in CLAUDE.md?" It is "what mistakes am I tired of correcting twice?"

Four Rules I Would Add

Read Before Writing

Focused changes are good, but they can be misunderstood. "Do not touch unrelated code" does not mean "ignore surrounding code." Before adding a function, the agent should read the exports, the caller, and any nearby utility that looks relevant. A lot of bad AI code is not wildly wrong. It is redundant. It recreates something that already existed 40 lines away.

Failure: agent adds a second version of logic that already exists.
Rule: read exports, callers, and nearby utilities before adding code.
Check: name the existing pattern before changing the file.

Keep Deterministic Decisions Out Of The Model

Use the model for judgment-heavy work: summarizing, classifying messy text, drafting, explaining, extracting from loose input. Do not use it for retry logic, status-code routing, pure transformations, or anything a normal function can decide. If a 503 status code already tells you the retry policy, code should handle that. The model should not be invited into every branch of the system just because it can produce an answer.

Failure: agent makes a probabilistic call where code should decide.
Rule: do not use the model for routing, retries, status handling, or pure transforms.
Check: if a deterministic input answers the question, code owns the branch.

Do Not Claim Done Without Proof

AI agents are very good at producing tests that pass. That is not the same as producing tests that matter. A useful test should fail when the actual business rule is broken. If the test only proves that a function returns something, it may be giving the model a false sense of completion.

The same goes for finished work. "Done" should come with the command that ran, the test result, the screenshot, or the reason verification was not possible.

Failure: agent says done because the code looks plausible.
Rule: do not claim completion without verification evidence.
Check: name the command, test, screenshot, or limitation in the final answer.

Fail Loudly

This is the one I care about most. The expensive failures are not the obvious ones. They are the failures that look like success. A migration that skipped records. A test suite that passed with skipped tests. A feature that works for the happy path but never checked the edge case you asked about.

When the agent is not sure, it should say that. When it skipped something, it should name it. When it could not verify the result, it should not imply that the work is done.

Failure: agent hides uncertainty behind a confident summary.
Rule: do not bury skipped work, failed checks, or unverified edge cases.
Check: the final answer says exactly what was not verified.

The Maintenance Rule

One meta-rule matters most: do not let CLAUDE.md become a junk drawer.

If a rule does not prevent a mistake, remove it. If a formatter, type checker, test, hook, or CI job can enforce it, move the rule down into tooling instead of spending model attention on it. If a rule only applies to one part of the codebase, move it closer to that part of the codebase. Claude Code supports project rules under .claude/rules/, including path-scoped rules, which is a cleaner fit for large projects than forcing every instruction into one file.

This is where old agent rules start to rot: stale test commands, folder rules for code that moved, preferences nobody remembers writing. A small repo can get away with one short CLAUDE.md. A larger system needs layers: project-level instructions, path-specific rules, and sometimes skills for repeatable workflows.

The goal is not to make the instruction file impressive. The goal is to make the next agent session less annoying.

The Practical Takeaway

Karpathy's rules are valuable because they describe the failures almost everyone sees when they first use AI coding agents seriously. The model guesses. It overbuilds. It edits too much. It loses track of what success means.

That makes the four-rule baseline worth keeping.

But the best CLAUDE.md file is not the longest one or the most viral one. It is the one that reflects your actual scars from working with agents in your own code.

Start with the baseline. Watch the mistakes. Add a rule only when the same failure shows up again.

That is how CLAUDE.md becomes engineering documentation instead of prompt cargo culting.

Sources: the post that sparked this, Forrest Chang's Karpathy-inspired CLAUDE.md, Claude Code's memory documentation, and research on agent rules.

Tagged

Karpathy's CLAUDE.md Rules Are a Starting Point, Not a Rulebook

April 26, 2026Jadan Jones6 min read

Karpathy's CLAUDE.md Rules Are a Starting Point, Not a Rulebook

A recent post connected Karpathy's Claude Code advice to a small CLAUDE.md template from Forrest Chang. The useful part is not the template itself. It is the failure pattern underneath it.

The four rules are useful because they hit very familiar failure modes:

Think before coding.
Prefer simple code.
Make focused changes.
Define success before running off to implement.

I would keep those four. I just would not worship them.

But I do not think the real lesson is "copy these four rules" or "add eight more." The deeper lesson is that a good CLAUDE.md should behave more like a bug report than a manifesto.

Every rule should exist because the agent has made that mistake before. One scar, one rule.

Why CLAUDE.md Matters

CLAUDE.md is memory, not law. Claude Code loads it as context, but the model still has to choose well.

That matters. A CLAUDE.md file does not turn a coding agent into a deterministic system. It changes the odds. It makes the right behavior easier for the model to find.

Short, specific rules beat long preference dumps. "Run npm test before saying the change is complete" is better than "be careful with tests." One can be checked. The other is mood lighting.

The Useful Mental Model

I would think about CLAUDE.md like this:

Agent failure	Better instruction	Proof it worked
The agent guesses instead of asking	State assumptions before coding	The final answer names the assumption or asks for missing context
The agent edits too much	Tie every changed line to the request	The diff contains no unrelated cleanup
The agent says "done" too early	Show verification before claiming success	The final answer includes the test, command, or limitation
The agent repeats an old mistake	Add one narrow rule for that failure	The same mistake stops showing up in later sessions

That table is the reason I like the "bug report" framing. It keeps the file grounded in behavior you can observe.

The Part I Would Be Careful With

The source post makes a strong benchmark-style claim about mistake rates dropping after adding more rules. I would treat numbers like that as interesting, but not universal.

So I would not copy a 12-rule template and call it done. I would start with Karpathy's baseline, then add rules only when they map to real mistakes in my own workflow.

The useful question is not "how many rules should be in CLAUDE.md?" It is "what mistakes am I tired of correcting twice?"

Four Rules I Would Add

Read Before Writing

Failure: agent adds a second version of logic that already exists.
Rule: read exports, callers, and nearby utilities before adding code.
Check: name the existing pattern before changing the file.

Keep Deterministic Decisions Out Of The Model

Failure: agent makes a probabilistic call where code should decide.
Rule: do not use the model for routing, retries, status handling, or pure transforms.
Check: if a deterministic input answers the question, code owns the branch.

Do Not Claim Done Without Proof

The same goes for finished work. "Done" should come with the command that ran, the test result, the screenshot, or the reason verification was not possible.

Failure: agent says done because the code looks plausible.
Rule: do not claim completion without verification evidence.
Check: name the command, test, screenshot, or limitation in the final answer.

Fail Loudly

When the agent is not sure, it should say that. When it skipped something, it should name it. When it could not verify the result, it should not imply that the work is done.

Failure: agent hides uncertainty behind a confident summary.
Rule: do not bury skipped work, failed checks, or unverified edge cases.
Check: the final answer says exactly what was not verified.

The Maintenance Rule

One meta-rule matters most: do not let CLAUDE.md become a junk drawer.

The goal is not to make the instruction file impressive. The goal is to make the next agent session less annoying.

The Practical Takeaway

That makes the four-rule baseline worth keeping.

But the best CLAUDE.md file is not the longest one or the most viral one. It is the one that reflects your actual scars from working with agents in your own code.

Start with the baseline. Watch the mistakes. Add a rule only when the same failure shows up again.

That is how CLAUDE.md becomes engineering documentation instead of prompt cargo culting.

Sources: the post that sparked this, Forrest Chang's Karpathy-inspired CLAUDE.md, Claude Code's memory documentation, and research on agent rules.

Tagged

Karpathy's CLAUDE.md Rules Are a Starting Point, Not a Rulebook

Karpathy's CLAUDE.md Rules Are a Starting Point, Not a Rulebook

Why CLAUDE.md Matters

The Useful Mental Model

The Part I Would Be Careful With

Four Rules I Would Add

Read Before Writing

Keep Deterministic Decisions Out Of The Model

Do Not Claim Done Without Proof

Fail Loudly

The Maintenance Rule

The Practical Takeaway

Recommended

Regular Code, Machine Learning, or LLMs: A Business Guide

Why Employees Ignore Expensive AI Tools at Work

How Cheaply Can You Build an App? I'm Testing It

Let's Talk

Karpathy's CLAUDE.md Rules Are a Starting Point, Not a Rulebook

Karpathy's CLAUDE.md Rules Are a Starting Point, Not a Rulebook

Why CLAUDE.md Matters

The Useful Mental Model

The Part I Would Be Careful With

Four Rules I Would Add

Read Before Writing

Keep Deterministic Decisions Out Of The Model

Do Not Claim Done Without Proof

Fail Loudly

The Maintenance Rule

The Practical Takeaway

Recommended

Regular Code, Machine Learning, or LLMs: A Business Guide

Why Employees Ignore Expensive AI Tools at Work

How Cheaply Can You Build an App? I'm Testing It

Let's Talk