Transformers

The Transformer architecture revolutionized machine learning with its attention mechanism, enabling models to capture dependencies across entire sequences. Yet despite their dominance, Transformers suffer from a fundamental limitation: they have amnesia. Every token beyond the context window vanishes into oblivion, and even within that window, the quadratic complexity of attention makes scaling prohibitively expensive. In December 2024, Google Research introduced Titans, a new family of architectures that fundamentally rethinks how neural networks handle memory. The breakthrough isn’t just another efficiency trick—it’s a paradigm shift that treats memory itself as a learnable neural network, updated in real-time during inference through gradient descent on a surprise-based objective. ...

Transformers

When 10% Attention Beats 100%: The Mathematics Behind Sparse LLM Inference

When AI Learns to Remember: How Google's Titans Architecture Solved the Long-Term Memory Problem