Why Unicode Has Three Encoding Schemes: The Engineering Trade-offs Behind UTF-8, UTF-16, and UTF-32
On September 2, 1992, Ken Thompson sat in a New Jersey diner with Rob Pike and sketched an encoding scheme on a placemat. That dinner napkin design became UTF-8—the encoding that now powers 99% of the web. But UTF-8 is just one of three encoding schemes for Unicode, alongside UTF-16 and UTF-32. Why does Unicode need three different ways to represent the same characters? The answer reveals fundamental trade-offs in computer systems design: space efficiency versus processing simplicity, backward compatibility versus clean architecture, and the messy reality of historical decisions that cannot be undone. ...