When the Answer Lies at the End of a Branch: The Complete Architecture of Inference-Time Search Methods for LLM Reasoning

The emergence of reasoning models like DeepSeek-R1, OpenAI’s o3, and Google’s Gemini thinking mode has fundamentally shifted how we think about LLM inference. These models don’t just generate—they search. The question is no longer “what should the model output?” but “how should the model search for the answer?” This shift from generation to search has spawned an entire taxonomy of inference-time algorithms, each with distinct trade-offs between computational cost and output quality. Understanding these methods—their mathematical foundations, implementation details, and practical performance—is essential for anyone deploying reasoning models in production. ...

5 min · 932 words

When the Path Matters More Than the Answer: How Process Reward Models Transform LLM Reasoning

A math student solves a complex integration problem. Her final answer is correct, but halfway through, she made a sign error that accidentally canceled out in the next step. The teacher gives full marks—after all, the answer is right. But should it count? This scenario captures the fundamental flaw in how we’ve traditionally evaluated Large Language Model (LLM) reasoning: Outcome Reward Models (ORMs) only check the final destination, ignoring whether the path was sound. Process Reward Models (PRMs) represent a paradigm shift—verifying every step of reasoning, catching those hidden errors that coincidentally produce correct answers, and enabling the test-time scaling that powers reasoning models like OpenAI’s o1 and DeepSeek-R1. ...

7 min · 1473 words