Chain Rule

In 1986, David Rumelhart, Geoffrey Hinton, and Ronald Williams published a paper in Nature that would transform artificial intelligence. The paper, “Learning representations by back-propagating errors,” demonstrated that a mathematical technique from the 1970s could train neural networks orders of magnitude faster than existing methods. The speedup wasn’t incremental—it was the difference between a model taking a week to train and taking 200,000 years. But backpropagation wasn’t invented in 1986. Its modern form was first published in 1970 by Finnish master’s student Seppo Linnainmaa, who described it as “reverse mode automatic differentiation.” Even earlier, Henry J. Kelley derived the foundational concepts in 1960 for optimal flight path calculations. What the 1986 paper achieved wasn’t invention—it was recognition. The authors demonstrated that this obscure numerical technique was exactly what neural networks needed. ...