Gradient-based Optimization and Implicit Regularization Over Non-convex Landscapes

Gradient-based Optimization and Implicit Regularization Over Non-convex Landscapes
Author: Xiaoxia (Shirley) Wu
Publisher:
Total Pages: 328
Release: 2020
Genre:
ISBN:


Download Gradient-based Optimization and Implicit Regularization Over Non-convex Landscapes Book in PDF, Epub and Kindle

Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of-the-art models such as deep neural networks are applied. One of the most widely-used algorithms is the first-order iterative gradient-based algorithm, i.e., (stochastic) gradient descent method. Two main challenges arise from understanding the gradient-based algorithm over the non-convex landscapes: the convergence complexity and the algorithm's solutions. This thesis aims to tackle the two challenges by providing a theoretical framework and empirical investigation on three popular gradient-based algorithms, namely, adaptive gradient methods [39], weight normalization [138] and curriculum learning [18]. For convergence, the stepsize or learning rate plays a pivotal role in the iteration complexity. However, it depends crucially on the (generally unknown) Lipschitz smoothness constant and noise level on the stochastic gradient. A popular stepsize auto-tuning method is the adaptive gradient methods such as AdaGrad that update the learning rate on the fly according to the gradients received along the way; Yet, the theoretical guarantees to date for AdaGrad are for online and convex optimization. We bridge this gap by providing theoretical guarantees for the convergence of AdaGrad for smooth, non-convex functions; we show that it converges to a stationary point at the (log(N)/ √N) rate in the stochastic setting and at the optimal (1/N) rate in the batch (non-stochastic) setting. Extensive numerical experiments are provided to corroborate our theory. For the gradient-based algorithm solution, we study weight normalization (WN) methods in the setting of an over-parameterized linear regression problem where WN decouples the weight vector with a scale and a unit vector. We show that this reparametrization has beneficial regularization effects compared to gradient descent on the original objective. WN adaptively regularizes the weights and converges close to the minimum l2 norm solution, even for initializations far from zero. To further understand the stochastic gradient-based algorithm, we study the continuation method -- curriculum learning (CL) -- inspired by cognitive science that humans learn from simple to complex order. CL has proposed ordering examples during training based on their difficulty, while anti-CL proposed the opposite ordering. Both CL and anti-CL have been suggested as improvements to the standard i.i.d. training. We set out to investigate the relative benefits of ordered learning in three settings: standard-time, short-time, and noisy label training. We find that both orders have only marginal benefits for standard benchmark datasets. However, with limited training time budget or noisy data, curriculum, but not anti-curriculum ordering, can improve the performance

The Complexity of Optimization Beyond Convexity

The Complexity of Optimization Beyond Convexity
Author: Yair Menachem Carmon
Publisher:
Total Pages:
Release: 2020
Genre:
ISBN:


Download The Complexity of Optimization Beyond Convexity Book in PDF, Epub and Kindle

Gradient descent variants are the workhorse of modern machine learning and large-scale optimization more broadly, where objective functions are often non-convex. Could there be better general-purpose optimization methods than gradient descent, or is it in some sense unimprovable? This thesis addresses this question from the perspective of the worst-case oracle complexity of finding near-stationary points (i.e., points with small gradient norm) of smooth and possibly non-convex functions. On the negative side, we prove a lower bound showing that gradient descent is unimprovable for a natural class of problems. We further prove the worst-case optimality of stochastic gradient descent, recursive variance reduction, cubic regularization of Newton's method and high-order tensor methods, in each case under the set of assumptions for which the method was designed. To prove our lower bounds we extend theory of information-based oracle complexity to the realm of non-convex optimization. On the positive side, we use classical techniques from optimization (namely Nesterov momentum and Krylov subspace methods) to accelerate gradient descent in a large subclass of non-convex problems with higher-order smoothness. Furthermore, we show how recently proposed variance reduction techniques can further improve stochastic gradient descent when stochastic Hessian-vector products available.

Beyond the Worst-Case Analysis of Algorithms

Beyond the Worst-Case Analysis of Algorithms
Author: Tim Roughgarden
Publisher: Cambridge University Press
Total Pages: 705
Release: 2021-01-14
Genre: Computers
ISBN: 1108494315


Download Beyond the Worst-Case Analysis of Algorithms Book in PDF, Epub and Kindle

Introduces exciting new methods for assessing algorithms for problems ranging from clustering to linear programming to neural networks.

Provable Non-convex Optimization for Learning Parametric Models

Provable Non-convex Optimization for Learning Parametric Models
Author: Kai Zhong (Ph. D.)
Publisher:
Total Pages: 866
Release: 2018
Genre:
ISBN:


Download Provable Non-convex Optimization for Learning Parametric Models Book in PDF, Epub and Kindle

Non-convex optimization plays an important role in recent advances of machine learning. A large number of machine learning tasks are performed by solving a non-convex optimization problem, which is generally NP-hard. Heuristics, such as stochastic gradient descent, are employed to solve non-convex problems and work decently well in practice despite the lack of general theoretical guarantees. In this thesis, we study a series of non-convex optimization strategies and prove that they lead to the global optimal solution for several machine learning problems, including mixed linear regression, one-hidden-layer (convolutional) neural networks, non-linear inductive matrix completion, and low-rank matrix sensing. At a high level, we show that the non-convex objectives formulated in the above problems have a large basin of attraction around the global optima when the data has benign statistical properties. Therefore, local search heuristics, such as gradient descent or alternating minimization, are guaranteed to converge to the global optima if initialized properly. Furthermore, we show that spectral methods can efficiently initialize the parameters such that they fall into the basin of attraction. Experiments on synthetic datasets and real applications are carried out to justify our theoretical analyses and illustrate the superiority of our proposed methods.

Introduction to Deep Learning: A Beginner’s Edition

Introduction to Deep Learning: A Beginner’s Edition
Author: Harshitha Raghavan Devarajan
Publisher: INENCE PUBLICATIONS PVT LTD
Total Pages: 174
Release: 2024-08-10
Genre: Antiques & Collectibles
ISBN: 9395940204


Download Introduction to Deep Learning: A Beginner’s Edition Book in PDF, Epub and Kindle

"Introduction to Deep Learning: A Beginner’s Edition" is a comprehensive guide designed specifically for newcomers to the field of deep learning. This book provides an accessible introduction to the fundamental concepts, making it an ideal starting point for those who are curious about artificial intelligence and its rapidly expanding applications. The book begins with a clear explanation of what deep learning is and how it differs from traditional machine learning, covering the basics of neural networks and how they are used to recognize patterns and make decisions. One of the key strengths of this book is its practical, hands-on approach. Readers are guided through the process of building, training, and deploying neural networks using popular frameworks like TensorFlow and PyTorch. The step-by-step instructions, along with code snippets, allow even those with little to no programming experience to engage actively with the material. Visual aids, such as diagrams and flowcharts, are used throughout the book to simplify complex topics, making it easier for readers to grasp the inner workings of neural networks. The book also explores real-world applications of deep learning, highlighting its impact across various industries, including healthcare, autonomous vehicles, and natural language processing. By providing context and practical examples, the book demonstrates how deep learning is being used to solve complex problems and transform industries. In addition to the core content, the book includes a glossary of key terms, quizzes, and exercises to reinforce learning. "Introduction to Deep Learning: A Beginner’s Edition" is more than just a textbook; it is a complete learning experience designed to equip beginners with the knowledge and skills needed to embark on a successful journey into the world of deep learning.

Mathematical Aspects of Deep Learning

Mathematical Aspects of Deep Learning
Author: Philipp Grohs
Publisher: Cambridge University Press
Total Pages: 493
Release: 2022-12-31
Genre: Computers
ISBN: 1316516784


Download Mathematical Aspects of Deep Learning Book in PDF, Epub and Kindle

A mathematical introduction to deep learning, written by a group of leading experts in the field.

Patterns, Predictions, and Actions: Foundations of Machine Learning

Patterns, Predictions, and Actions: Foundations of Machine Learning
Author: Moritz Hardt
Publisher: Princeton University Press
Total Pages: 321
Release: 2022-08-23
Genre: Computers
ISBN: 0691233721


Download Patterns, Predictions, and Actions: Foundations of Machine Learning Book in PDF, Epub and Kindle

An authoritative, up-to-date graduate textbook on machine learning that highlights its historical context and societal impacts Patterns, Predictions, and Actions introduces graduate students to the essentials of machine learning while offering invaluable perspective on its history and social implications. Beginning with the foundations of decision making, Moritz Hardt and Benjamin Recht explain how representation, optimization, and generalization are the constituents of supervised learning. They go on to provide self-contained discussions of causality, the practice of causal inference, sequential decision making, and reinforcement learning, equipping readers with the concepts and tools they need to assess the consequences that may arise from acting on statistical decisions. Provides a modern introduction to machine learning, showing how data patterns support predictions and consequential actions Pays special attention to societal impacts and fairness in decision making Traces the development of machine learning from its origins to today Features a novel chapter on machine learning benchmarks and datasets Invites readers from all backgrounds, requiring some experience with probability, calculus, and linear algebra An essential textbook for students and a guide for researchers

Optimization for Machine Learning

Optimization for Machine Learning
Author: Suvrit Sra
Publisher: MIT Press
Total Pages: 509
Release: 2012
Genre: Computers
ISBN: 026201646X


Download Optimization for Machine Learning Book in PDF, Epub and Kindle

An up-to-date account of the interplay between optimization and machine learning, accessible to students and researchers in both communities. The interplay between optimization and machine learning is one of the most important developments in modern computational science. Optimization formulations and methods are proving to be vital in designing algorithms to extract essential knowledge from huge volumes of data. Machine learning, however, is not simply a consumer of optimization technology but a rapidly evolving field that is itself generating new optimization ideas. This book captures the state of the art of the interaction between optimization and machine learning in a way that is accessible to researchers in both fields. Optimization approaches have enjoyed prominence in machine learning because of their wide applicability and attractive theoretical properties. The increasing complexity, size, and variety of today's machine learning models call for the reassessment of existing assumptions. This book starts the process of reassessment. It describes the resurgence in novel contexts of established frameworks such as first-order methods, stochastic approximations, convex relaxations, interior-point methods, and proximal methods. It also devotes attention to newer themes such as regularized optimization, robust optimization, gradient and subgradient methods, splitting techniques, and second-order methods. Many of these techniques draw inspiration from other fields, including operations research, theoretical computer science, and subfields of optimization. The book will enrich the ongoing cross-fertilization between the machine learning community and these other fields, and within the broader optimization community.