Opus 4.6’s Simple Reasoning Errors in the Wild
Mar 11, 2026 · 907 words · 5-minute read
I consider myself a sophisticated LLM user. I’m on the max plans of leading LLMs. I have suites of agents that use different skills and build on existing memory. I build verification agents to check its work. I understand the implications of context compaction and make efforts to mitigate the risk. But over the past week I’ve noticed what appears to be one (or possibly two) recurring and serious reasoning errors in Claude Opus 4.6’s work, and I’d like to understand whether they are known issues and flag them for the broader community.
From my quick literature review, it appears that reasoning errors are already well known in the community. Even so, I was surprised for two reasons.
-
The errors are basic. A good high school student would spot them immediately and would not make them. The fact that leading models still make basic reasoning errors undercuts the notion that LLMs have achieved some dramatic intelligence breakthrough. These failures reminded me that the model is still an autoregressive machine, and its autoregressive nature is what most readily explains the pattern. It is one of many tools humans can use, and it introduces its unique failure modes that humans must learn to mitigate.
-
Until now I had only built verification agents to check for completeness and factual accuracy (e.g. whether a quotation it pulls exists). I had not checked for logical coherence. I’ll be doing more of that going forward, and I’d be interested to hear whether others have found verification agents effective at checking logical reasoning.
Please note that the following conversations took place in fresh sessions where no context was compacted.
Error 1: “Rhetorical/Semantic Drift”
I asked Claude Opus 4.6 to compare two payment structures between A and B, one where A receives a lump sum payment from B but nothing more after a certain point, and the other where A shares ongoing revenue with B. In comparing the two arrangements, Opus 4.6 wrote:
Without revenue sharing, A has no ongoing financial stake in the venture’s commercial success, reducing its incentive to provide continued cooperation. Additionally, A bears all the risk if sales disappoint.
The first sentence is right. The second is wrong. With no revenue share, B bears all risks if sales disappoint.
I have no way of knowing how Opus 4.6 made this mistake. Its own conjecture was that there was a rhetorical slide from “no incentive” to “and all risk,” which reads as a coherent escalation of disadvantages: “The underlying mechanism is local textual coherence overriding economic logic.”
My brief literature review points me to the Song et al paper which typologizes different kinds of LLM reasoning errors at a high level. I would put the error I saw as “order bias / anchoring in reasoning” the paper describes.
Mechanistically (according to Opus 4.6 itself):
Earlier rhetorical structure
↓
Implicit expectation of continuation
↓
Reasoning step follows narrative pattern
↓
Conclusion inconsistent with underlying logic
Error 2: “False Reliance on Heuristics / Shortcuts”
In a separate project, I asked Opus 4.6 to review a risk allocation arrangement between A and B, telling it to take A’s side. My existing draft said that a certain kind of liability would fall on A.
I then shared with Opus 4.6 other materials showing that the liability should fall on B. Opus 4.6 caught this discrepancy and suggested that liability should be modified to fall on B, but it then caveated with “fixing this would hurt A”.
Moving the risk away from A clearly helps A.
Again, I have no way of knowing how Opus 4.6 made this mistake. Its own initial conjecture was that it applied the heuristic “proposing a change to a draft = concession = costs our side.” In other words, the heuristic of modification equals concession equals bad for us is usually correct. But it breaks when the current draft already contains an error that’s bad for us. Fixing it actually helps us.
Mechanistically (according to Opus 4.6 itself):
Statistically common pattern in training data (?) or post-training RLHF (?)
↓
"Proposing a fix = concession = costs our side"
↓
Heuristic applied without context awareness
↓
Conclusion correct in general, wrong here
Conclusion
Unfortunately I didn’t have verification agents checking for logical coherence in these sessions, so I don’t know whether a separate model instance would have caught these errors. I’ll be testing more.
Some larger points:
-
Verify. Build verification into your workflow, including for logical reasoning, not just factual accuracy or completeness. It remains to be seen how effective it is though.
-
Software engineering is particular. The industry norm is for things to work 95% of the time in 95% of cases. Many other domains have far lower tolerance for error. The adoption curve, the unit economics, and the way humans integrate AI tools in those fields will look different. We shouldn’t index too heavily on SWE benchmarks when projecting LLM’s trajectory.
Software engineering is also unique in that engineers can write tests to mechanically verify logical correctness, and the good ones can build really good tests. The combination of high error tolerance and easy verifiability makes software a uniquely forgiving domain for LLM integration.
-
I don’t know whether the errors I’ve flagged here can be solved through more expert-labeled data in pre or post-training. But that’s clearly the direction, and it’s an area worth watching.
-
Heavy users learn things about LLM capabilities and limitations that no benchmark will tell you.