How does gradient descent work?

https://news.ycombinator.com/rss Hits: 23
Summary

This is the companion website for the paper Understanding Optimization in Deep Learning with Central Flows, published at ICLR 2025. Part I: how does gradient descent work? The simplest optimization algorithm is deterministic gradient descent: \[ w_{t+1} = w_t - \eta \, \nabla L(w_t) \] Perhaps surprisingly, traditional analyses of gradient descent cannot capture the typical dynamics of gradient descent in deep learning. We'll first explain why, and then we'll present a new analysis of gradient descent that does apply in deep learning. The dynamics of gradient descent Let's start with the picture that everyone has likely seen before. Suppose that we run gradient descent on a quadratic function \( \frac{1}{2} S x^2\), i.e. a smiley-face parabola. The parameter \(S\) controls the second derivative ("curvature") of the parabola: when \(S\) is larger, the parabola is steeper. If we run gradient descent on this function with learning rate \(\eta\), there are two possible outcomes. On the one hand, if \(S < 2/\eta\), then the parabola is "flat enough" for the learning rate \(\eta\), and gradient descent will converge. On the other hand, if \(S >2> 2/\eta\), then the parabola is "too sharp" for the learning rate \(\eta\), and gradient descent will oscillate back and forth with increasing magnitude. Your browser does not support the video tag. Click to play Consider gradient descent with learning rate \(\eta\) on a 1d quadratic function with curvature \(S\). If \(S < 2/\eta\), as on the left, the optimizer will converge; if \(S > 2/\eta\), as on the right, the optimizer will diverge. The same is true for a quadratic function in multiple dimensions. On a multi-dimensional quadratic, the eigenvalues of the Hessian matrix quantify the curvature along the corresponding eigenvectors. If any Hessian eigenvalue exceeds the threshold \(2/\eta\), then this means that the quadratic is "too sharp" in the corresponding eigenvector direction, and gradient descent will oscillate along tha...

First seen: 2025-10-07 16:10

Last seen: 2025-10-08 14:14