CANCELLED Britton Lecture/Colloquium – Peter Bartlett – Topics in Deep Learning Theory – Optimization in Deep Networks: convergence of Sharpness Aware Minimization and the edge of stability
Dr. Peter Bartlett – Professor, Department of Statistics at University of California, Berkeley
CANCELLED
Title: Optimization in Deep Networks: convergence of Sharpness Aware Minimization and the edge of stability
Abstract: We consider Sharpness-Aware Minimization (SAM), a gradientbased optimization method for deep networks that has exhibited
performance improvements on image and language prediction problems. We show that SAM applied to a convex quadratic
objective converges to a cycle about the minimum in the direction with the largest curvature. In the non-quadratic case, these
oscillations encourage drift toward wider minima, by performing gradient descent on the spectral norm of the Hessian. We relate
this behavior to an “edge of stability” phenomenon that has been empirically observed in neural networks trained by gradient
descent, where curvature increases to the point of instability. Based on joint work with Phil Long and Olivier Bousquet