engels.antonio.one

Rethinking Scale

What a 7-Million-Parameter Model Teaches Us About Intelligence


Samsung's research team released a model with 7 million parameters that beats systems with 100,000 times more parameters at reasoning tasks.

We assumed more parameters meant more intelligence. Each generation grew larger and more capable. Size unlocked capability. So we kept scaling.

But the relationship isn't as simple as we thought.

Consider the numbers. Samsung's Tiny Recursive Model (TRM) achieves 45% accuracy on ARC-AGI-1 puzzles. These test genuine abstract reasoning, not memorization. DeepSeek-R1 with 671 billion parameters gets 15.8%. OpenAI's o3-mini reaches 34.5%. Gemini 2.5 Pro hits 37%. TRM solves 87.4% of extreme Sudoku puzzles and navigates 85.3% of complex 30×30 mazes. The gap is substantial.

What explains this? The architecture. Most language models generate responses token by token, committing to each choice instantly. TRM works differently. It drafts a complete solution, checks it against an internal scratchpad, then revises. It repeats this process up to 16 times.

The model maintains two things simultaneously: a latent space for reasoning and a solution that gets refined each cycle. It alternates between updating its understanding and improving its answer. Depth through recursion, not layers. Repetition, not size.

This is how difficult problems actually get solved. You draft something. You check if it works. You revise what's wrong. You iterate until you get it right. Not because TRM mimics human cognition, but because iterative refinement works when you can verify your reasoning against constraints.

Which reveals something important. We've been optimizing for breadth when some problems demand depth. A model that knows everything superficially will struggle with tasks that require sustained reasoning. For structured problems with verifiable solutions, how you think matters as much as what you know.

TRM is a specialist. It won't write emails or debug code. It's designed for structured reasoning puzzles where rules are clear and solutions can be verified. Even at 45%, these tasks remain mostly unsolved. The goal is 85% on ARC-AGI-2.

But specialization reveals where this approach excels. Quality control systems detecting manufacturing anomalies through pattern recognition. Scheduling systems wrestling with complex constraints. Mathematical proof verification. Resource allocation where multiple requirements must hold simultaneously. TRM fits problems with clear rules and verifiable solutions. Where you can draft an answer, check if it violates constraints, and refine until it works. Where thinking deeply about one problem beats knowing a little about everything.

This matters because the field has been running an arms race. More parameters. More data. More compute. The assumption was that size would eventually solve everything. But size has costs. Energy consumption. Training time. Accessibility. Only organizations with massive resources can train frontier models.

TRM proves another path exists for certain problem types. The researchers trained this from scratch on small datasets. They used two layers. They adapted the design for different problem types. And they released the code publicly. Clever architecture over parameter count. Test-time compute over model size. Recursive refinement over single-pass inference.

So what else have we overlooked? For structured reasoning tasks, how many architectural innovations are we missing because we're too invested in scaling general-purpose models?

Seven million parameters. Two layers. Better results than systems orders of magnitude larger at structured reasoning tasks. That's not an argument against scale for general capability. Building multimodal systems that understand context across languages, handle real-time production loads, and maintain coherence across diverse tasks still benefits from scale when properly engineered. But for structured reasoning with clear rules and verifiable solutions, architecture matters more than size. Different problems need different approaches.

For certain types of reasoning, intelligence isn't about size. It's about how you think.
blog