When the Answer Lies at the End of a Branch: The Complete Architecture of Inference-Time Search Methods for LLM Reasoning

The emergence of reasoning models like DeepSeek-R1, OpenAI’s o3, and Google’s Gemini thinking mode has fundamentally shifted how we think about LLM inference. These models don’t just generate—they search. The question is no longer “what should the model output?” but “how should the model search for the answer?” This shift from generation to search has spawned an entire taxonomy of inference-time algorithms, each with distinct trade-offs between computational cost and output quality. Understanding these methods—their mathematical foundations, implementation details, and practical performance—is essential for anyone deploying reasoning models in production. ...

5 min · 932 words

How DeepSeek-R1 Learned to Think: The GRPO Algorithm Behind Open-Source Reasoning Models

On January 20, 2025, DeepSeek released R1—a 671B parameter Mixture-of-Experts model that achieved something remarkable: matching OpenAI’s o1 on reasoning benchmarks while being fully open-source. The breakthrough wasn’t just in scale or architecture, but in a fundamentally different approach to training reasoning capabilities: Group Relative Policy Optimization (GRPO), a reinforcement learning algorithm that eliminates the need for reward models while enabling sophisticated reasoning behaviors to emerge naturally. The Problem with Traditional LLM Training Standard large language models excel at pattern matching and next-token prediction, but struggle with tasks requiring multi-step logical deduction, self-correction, and complex problem decomposition. Chain-of-thought prompting helped, but it required extensive human-annotated demonstrations and still couldn’t match the systematic reasoning humans employ. ...

3 min · 472 words

When Seeing Is No Longer Believing: The Deepfake Arms Race Between Creation and Detection

In late 2017, a Reddit user with the handle “deepfakes” posted a video that would fundamentally change how we think about visual evidence. The clip showed a celebrity’s face seamlessly mapped onto another person’s body. It wasn’t the first time someone had manipulated video, but the quality was unprecedented—and the software to create it was soon released as open-source code. Within months, the term “deepfake” had entered the lexicon, representing a collision of deep learning and deception that continues to evolve at a startling pace. ...

8 min · 1685 words