From 1% Parameters to Full Capacity: The Mathematics and Engineering Behind LoRA's Evolution

Fine-tuning a 7-billion parameter model used to demand 100+ GB of VRAM—roughly the memory of four A100 GPUs. Today, the same task runs on a consumer RTX 4090 with 24 GB. This 4× reduction didn’t come from better hardware; it came from a mathematical insight about the structure of neural network adaptations. Low-Rank Adaptation (LoRA), introduced by Microsoft in 2021, fundamentally changed how we think about model fine-tuning. The core idea is deceptively simple: instead of updating all parameters, inject small trainable matrices that modify the model’s behavior. But behind this simplicity lies deep connections to linear algebra, information theory, and the geometry of neural network weight spaces. ...

4 min · 1660 words

When 1+1>2: How Model Merging Creates Superhuman LLMs Without Training

The Open LLM Leaderboard tells a surprising story: many top-performing models aren’t trained at all. They’re merged. A 7B parameter model, created by strategically blending weights from existing fine-tuned models, can outperform models 10x its size. This isn’t alchemy—it’s mathematics. Model merging represents a paradigm shift in how we think about model development. Instead of investing millions in GPU hours for training, practitioners are discovering that the collective intelligence embedded in existing open-source models can be combined to create something greater than the sum of its parts. The technique requires no gradients, no backward passes, and no training data. Just arithmetic operations on weight tensors. ...

10 min · 1940 words