Quantization

How 4 Bits Preserves 99% Quality: The Mathematics Behind LLM Quantization

A 70-billion parameter model requires 140 GB of GPU memory in FP16. A consumer RTX 4090 has 24 GB. This arithmetic gap defined the boundary between “enterprise AI” and “what you can run at home” until quantization mathematics cracked the code. The counterintuitive reality: reducing precision from 16 bits to 4 bits—a 75% compression—often preserves over 95% of model quality. Not through magic, but through a profound understanding of how neural networks encode information. ...

How JPEG Compression Actually Works: The Mathematics Behind Every Photo

In September 1992, a committee called the Joint Photographic Experts Group published a standard that would fundamentally change how humanity stores and shares images. The JPEG format, based on the discrete cosine transform (DCT), made digital photography practical by reducing file sizes by a factor of 10 while maintaining acceptable visual quality. Three decades later, JPEG remains the most widely used image format in the world, with billions of images created daily. ...