When Many Models Beat One: The Mathematics Behind Mixture-of-Agents and Collaborative LLM Intelligence

In June 2024, a paper landed on arXiv that challenged a fundamental assumption in AI development: that bigger, more expensive single models are always better. The Mixture-of-Agents (MoA) methodology demonstrated that combining multiple open-source LLMs could outperform GPT-4 Omni—achieving 65.1% on AlpacaEval 2.0 versus GPT-4’s 57.5%—while using only freely available models. But the story didn’t end there. By February 2025, researchers would question whether mixing different models was even necessary, proposing Self-MoA as a simpler alternative. Then came RMoA with residual connections, and in January 2026, Attention-MoA introduced inter-agent semantic attention mechanisms. The MoA paradigm has evolved rapidly, revealing deep insights about the nature of LLM collaboration, the quality-diversity trade-off, and when collective intelligence actually outperforms individual excellence. ...

10 min · 2034 words

How Mixture of Experts Scales to Trillion Parameters: The Sparse Architecture Revolution Behind Modern LLMs

When DeepSeek-V3 was released in December 2024, it achieved something remarkable: a 671-billion-parameter model that activates only 37 billion parameters per token. This isn’t a magic trick—it’s the power of Mixture of Experts (MoE), an architectural paradigm that has quietly become the backbone of nearly every frontier large language model. The math is compelling. A dense 671B model would require approximately 1,342 TFLOPs per token during inference. DeepSeek-V3 achieves comparable performance with roughly 74 TFLOPs—an 18x reduction in compute. This isn’t incremental optimization; it’s a fundamental rethinking of how neural networks scale. ...

9 min · 1822 words