Who invented deep residual learning?

https://news.ycombinator.com/rss Hits: 20
Summary

Modern AI is based on deep artificial neural networks (NNs).[DLH] As of 2025, the most cited scientific article of the 21st century is an NN paper on deep residual learning with residual connections.[MOST25,25b] Who invented this? Here is the timeline of the evolution of deep residual learning: ★ 1991: recurrent residual connections (weight 1.0) solve the vanishing gradient problem ★ 1997 LSTM: plain recurrent residual connections (weight 1.0) ★ 1999 LSTM: gated recurrent residual connections (gates initially open: 1.0) ★ 2005: unfolding LSTM—from recurrent to feedforward residual NNs ★ May 2015: very deep Highway Net—gated feedforward residual connections (initially 1.0) ★ Dec 2015: ResNet—like an open-gated Highway Net (or an unfolded 1997 LSTM) 1991: recurrent residual connections solve the vanishing gradient problem Sepp Hochreiter introduced residual connections for recurrent NNs (RNNs) in a diploma thesis (June 1991)[VAN1] supervised by Jürgen Schmidhuber, at a time when compute was about 10 million times more expensive than today (2025). His recurrent residual connection was mathematically derived from first principles to overcome the fundamental deep learning problem of vanishing or exploding gradients, first identified and analyzed in the very same thesis.[VAN1][DLP][DLH] Like most good things, the recurrent residual connection is very simple: a neural unit with the identity activation function has a connection to itself, and the weight of this connection is 1.0. That is, at every time step of information processing, this unit just adds its current input to its previous activation value. So it's just an incremental integrator. This simple setup ensures constant error flow in deep gradient-based error-minimizing RNNs: error signals can be backpropagated [BP1-4][BPTT1-2] through such units for millions of steps without vanishing or exploding,[VAN1] since according to the 1676 chain rule[LEI07-21b][L84] by Leibniz, the relevant multiplicative first derivatives...

First seen: 2025-10-18 16:57

Last seen: 2025-10-19 12:01