Understanding and Accelerating the Optimization of Modern Machine Learning
Author | : Chaoyue Liu (Ph. D. in computer science) |
Publisher | : |
Total Pages | : 0 |
Release | : 2021 |
Genre | : Deep learning (Machine learning) |
ISBN | : |
Download Understanding and Accelerating the Optimization of Modern Machine Learning Book in PDF, Epub and Kindle
Over the last decade, we have seen impressive progress of deep learning on a variety of intelligence tasks. The success of deep learning is due, to a great extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks, which is often over-parameterized, i.e., the number of parameters greatly exceeds the number of training samples. However, theoretically, it is still far from clear that why gradient descent algorithms can efficiently optimize the seemingly highly non-convex loss functions (a.k.a. objective functions). In this dissertation, we target to close this gap between the theory and the practice. We first show that certain sufficiently wide neural networks, as typical examples of large non-linear models, exhibit an amazing, while somehow counter-intuitive, phenomenon--transition to linearity. Specifically, these networks can be well approximated by linear models; moreover, they become linear models in the infinite network width limit. Based on this phenomenon, we further provide an optimization theory that describes the loss landscape of over-parameterized machine learning models and further explains the convergence of gradient descent methods on these models. Note that this theory covers both the models that have the "transition to linearity'', and those may not have this property, e.g., wide networks with non-linear output layer. Finally, we prove that, in the stochastic setting, the popularly used Nesterov's momentum does not accelerate the stochastic gradient descent, even for quadratic optimization problems. Furthermore, we propose a new method, MaSS, that provably accelerate SGD in the over-parameterized setting.