Everyone is Thinking About "Thinking:" Reading Recent AI Papers

While a lot of daily AI readings I, and many others, do is news or interviews, it's critical to also track research coming out of teams within big AI companies.

Jun 20, 2025

While I find publitistic and applied writing and research on AI very valuable and interesting, I also try to stay up-to-date on the papers that come out of the largest AI companies. Anthropic, OpenAI and Google together with DeepMind all have powerful research departments, each with seemingly their focus. Apple and Meta are also ramping up their presence in AI research. Considering that these are the people and organizations at the frontere of AI developement way before the public gets to play around with new models and features, I try to read the papers that come out of these research teams pretty consistely. Each of these research teams seems to have carved out their own focus: Anthropic with interpretability, OpenAI with AGI research, DeepMind with super technical ML research.

Looking throught the most recent publications, I found it intriguing that both Apple—a newer player on the AI market—and Anthropic were both tackling the issue of models’ “thinking” and reasoning. In this week’s newsletter, I go through the viral Apple paper and two recent Anthropic papers to spotlight the research that is coming out these companies.

Apple: AI is not thinking

Apple’s latest research paper, The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, has stirred up far more discussion than its 11-page length might suggest. The authors scrutinize the Large Reasoning Models (LRMs), like OpenAI’s o3, DeepSeek-R1, and Claude 3.7 Sonnet, which have been cast as the next step in AI development towards AGI. These models attempt to solve problems by breaking them into iterative steps using mechanisms like Chain of Thought.

Rather than relying on traditional math benchmarks, Apple evaluates these models using puzzles that better control for problem complexity. Apple researchers find that these models hit a wall when problems grow more complex: their accuracy declines, they waste computation (referred in literature as “overthinking”), and they fail to generalize. Researchers also find that giving models more detailed instructions for puzzles they were solving didn’t improve the models’ reasoning ability. LRMs may outperform standard LLMs on moderately complex tasks, but both model types collapse at higher levels of complexity; on simple reasoning tasks, standard LLMs way outperform LRMs because they don’t get trapped by “overthinking.” The authors extend their evaluation beyond final answers, analyzing intermediate “thinking traces” to reveal another insight: correct answers tend to emerge earlier, while incorrect ones persist longer, meaning the models may literally think themselves into failure. Oh wow, models can overthink too—they got something from human cognition :)

Figures from Apple’s paper on how Claude 3.7 performed in a moving disks puzzle.

I read this paper as a critical piece of AI literacy. It reminds me not to confuse impressive demos with general reasoning capabilities. Models that look smart often aren’t thinking in any human sense: They’re just good at pattern matching, up to a certain level. What’s not yet clear is how categorical Apple’s conclusions really are. The authors call the found limitations in models “fundamental,” yet they also express hope that their work could guide better LRM development. So: are these issues fixable with better architecture and training, or do we need to rethink the whole “reasoning model” paradigm from scratch. It’s also notable that Apple is a company trailing far behind in frontier AI, and whose WWDC 2025 announcements were widely seen as underwhelming.

The tech commentators, though, saw this paper as a much bigger blow to the AI world and AGI optimism, suggesting that Apple has just “poured cold water” on the idea that thinking models are anywhere close to real intelligence. There was a response paper that also went viral that was alegedly co-authored with Claude Opus, but its author has since came forward saying that it was only a joke. I didn’t go too deep into that drama but definitely saw the response paper also make some headlines and instagram reels.

At the very least, this paper should shift how we talk about AI “reasoning.” For practitioners and researchers, like us, it’s a valuable reminder: we need to stop assuming these systems can generalize and start asking which types of problems they are actually good at solving.

Inside the mind of Claude: a step towards interpretability

Anthropic has just released two huge papers attempting to trace the “thoughts” of Claude models: “On the Biology of a Large Language Model” and “Circuit Tracing: Revealing Computational Graphs in Language Models.” Both papers have a stunningly interactive interface that could serve as an inspiration for how tech allows for better presentation of heavily academic work. These papers draw academic basis from neuroscience among other fields, aiming to build an "AI microscope" to explore how and why Claude processes, reasons, and (sometimes) hallucinates. At the heart of both studies is a fundamental problem: we still don’t understand how models do most of what they do, and that’s a major risk for reliability, safety, and auditability. These papers are the next step towards Anthropic’s work on AI interpretability which I adore.

Both papers are dense, and I would urge you to go through both, or at least check out the press release. The first paper connects interpretable “features” into computational circuits, tracing how inputs become outputs in Claude’s responses while the second zooms into core Claude’s “behaviors,” like multilingual understanding, rhyme planning, math reasoning, and susceptibility to hallucination.

Two big recent papers released by Anthropic.

Here are some of my key high-level insights from reading the papers over the course of the week:

“Multilingual Circuits,” poetry and math

Claude can learn a concept in one language and apply it in another. But there is still little understanding of how it fits into larger circuits. What researchers were able to track in the circuits is that the same prompt in multiple languages triggers very similar or identical circuit response. Of course, these are also questions that neuroscientists ask about bilingual/multilingual people: Do we automatically translate queries in our head even if fluent in multiple languages? How does knowledge get translated and replicated across languages in our heads?

Does Claude “feel” rhythm? At least researchers found that Claude pre-plans rhymes, adjusting them flexibly mid-sentence.

Claude also struggles with math, how human of it :) The paper shows that Claude didn’t just memorize addition and multiplication tables from the Internet, but uses parallel computational strategies, like approximations and digit-level precision to reach math answers. However, researchers warn that LLMs are still language-based and not math models.

Plausible answers feel better than truth

Claude has been taught to not answer queries instead of lying or revealing sensitive information, but can be nudged into hallucination by user hints or grammar pressure. Researchers track that the initial “value” or setting that LLMs want to fulfill is completing a query which sometimes overrides newer settings and safety.

The paper echoes philosopher Harry Frankfurt’s “bullshitting” framework where models produce motivated but unfaithful reasoning in order of generating plausible arguments with little concern for the truth. Researchers are looking for solutions for this. But more importantly Anthropic is still trying to eliminate jailbreak vulnerabilities of their model. The paper finds that grammatical flow often overrides safety triggers, a major source of concern for misuse.

These papers don’t claim interpretability, but they show Anthropic’s dedication to fragment Claude’s inner logic into more visible and transparent circuits that users can trust.

My thoughts: AI will hopefully help us understand our brains )

After reading both Apple’s takedown of reasoning models and Anthropic’s stunning interpretability experiments, I’m left with this question: why are so many researchers—and by extension, all of us—so preoccupied with tracing how AI models think? Part of the answer lies in the pragmatic need for transparency, safety, and control. But another part feels deeper, even a little existential.

There’s something uncanny about how easily we slide into metaphors like “overthinking,” “hallucinating,” or “in its head.” These papers, even when trying to be rigorous, flirt constantly with anthropomorphism. Yet I find that attitude compelling: NOT because I am ok with AI anthropomorphism, but rather because I read these papers as insights not just about the machines. The language, the questions, the attitude in these papers remind me how little we understand our own cognition. Sure, the Apple paper made a lot of people feel a bit less existential with the bold claim that LRMs are not actually reasoning, but where do we draw the line our own reasoning and success with puzzles.

Also it’s interesting to me that DeepMind, whose research department I admire and fear, seem to be side-stepping this line of inquiry in public, focusing instead on behaviors, agent architectures, and memory. Google research team seemed to have picked up AI implementation in research/impact work as their focus with some really cool sources coming out about AI in geospacial analysis and AI in health.

I can’t stress enough how much I recommend reading the Anthropic papers, but I also would advise to stop at some unaswered questions and ask/search if we have these answers on human cognition and reasoning, and get excited (or scared) about how much we don’t know about outselves or these models.

Thanks for reading Sonya’s Substack! This post is public, so feel free to share it.

Sonya’s Substack