Adaptive Gradient Descent for Convex and Non-convex Stochastic Optimization

Adaptive Gradient Descent for Convex and Non-convex Stochastic Optimization
Author: Aleksandr Ogaltsov
Publisher:
Total Pages:
Release: 2019
Genre:
ISBN:


Download Adaptive Gradient Descent for Convex and Non-convex Stochastic Optimization Book in PDF, Epub and Kindle

In this paper we propose several adaptive gradient methods for stochastic optimization. Our methods are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of the gradient and variance of the stochastic approximation for the gradient. We consider an accelerated gradient descent for convex problems and gradient descent for non-convex problems. In the experiments we demonstrate superiority of our methods to existing adaptive methods, e.g. AdaGrad and Adam.

Gradient-based Optimization and Implicit Regularization Over Non-convex Landscapes

Gradient-based Optimization and Implicit Regularization Over Non-convex Landscapes
Author: Xiaoxia (Shirley) Wu
Publisher:
Total Pages: 328
Release: 2020
Genre:
ISBN:


Download Gradient-based Optimization and Implicit Regularization Over Non-convex Landscapes Book in PDF, Epub and Kindle

Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of-the-art models such as deep neural networks are applied. One of the most widely-used algorithms is the first-order iterative gradient-based algorithm, i.e., (stochastic) gradient descent method. Two main challenges arise from understanding the gradient-based algorithm over the non-convex landscapes: the convergence complexity and the algorithm's solutions. This thesis aims to tackle the two challenges by providing a theoretical framework and empirical investigation on three popular gradient-based algorithms, namely, adaptive gradient methods [39], weight normalization [138] and curriculum learning [18]. For convergence, the stepsize or learning rate plays a pivotal role in the iteration complexity. However, it depends crucially on the (generally unknown) Lipschitz smoothness constant and noise level on the stochastic gradient. A popular stepsize auto-tuning method is the adaptive gradient methods such as AdaGrad that update the learning rate on the fly according to the gradients received along the way; Yet, the theoretical guarantees to date for AdaGrad are for online and convex optimization. We bridge this gap by providing theoretical guarantees for the convergence of AdaGrad for smooth, non-convex functions; we show that it converges to a stationary point at the (log(N)/ √N) rate in the stochastic setting and at the optimal (1/N) rate in the batch (non-stochastic) setting. Extensive numerical experiments are provided to corroborate our theory. For the gradient-based algorithm solution, we study weight normalization (WN) methods in the setting of an over-parameterized linear regression problem where WN decouples the weight vector with a scale and a unit vector. We show that this reparametrization has beneficial regularization effects compared to gradient descent on the original objective. WN adaptively regularizes the weights and converges close to the minimum l2 norm solution, even for initializations far from zero. To further understand the stochastic gradient-based algorithm, we study the continuation method -- curriculum learning (CL) -- inspired by cognitive science that humans learn from simple to complex order. CL has proposed ordering examples during training based on their difficulty, while anti-CL proposed the opposite ordering. Both CL and anti-CL have been suggested as improvements to the standard i.i.d. training. We set out to investigate the relative benefits of ordered learning in three settings: standard-time, short-time, and noisy label training. We find that both orders have only marginal benefits for standard benchmark datasets. However, with limited training time budget or noisy data, curriculum, but not anti-curriculum ordering, can improve the performance

Advances in Convex Analysis and Global Optimization

Advances in Convex Analysis and Global Optimization
Author: Nicolas Hadjisavvas
Publisher: Springer Science & Business Media
Total Pages: 601
Release: 2013-12-01
Genre: Mathematics
ISBN: 146130279X


Download Advances in Convex Analysis and Global Optimization Book in PDF, Epub and Kindle

There has been much recent progress in global optimization algo rithms for nonconvex continuous and discrete problems from both a theoretical and a practical perspective. Convex analysis plays a fun damental role in the analysis and development of global optimization algorithms. This is due essentially to the fact that virtually all noncon vex optimization problems can be described using differences of convex functions and differences of convex sets. A conference on Convex Analysis and Global Optimization was held during June 5 -9, 2000 at Pythagorion, Samos, Greece. The conference was honoring the memory of C. Caratheodory (1873-1950) and was en dorsed by the Mathematical Programming Society (MPS) and by the Society for Industrial and Applied Mathematics (SIAM) Activity Group in Optimization. The conference was sponsored by the European Union (through the EPEAEK program), the Department of Mathematics of the Aegean University and the Center for Applied Optimization of the University of Florida, by the General Secretariat of Research and Tech nology of Greece, by the Ministry of Education of Greece, and several local Greek government agencies and companies. This volume contains a selective collection of refereed papers based on invited and contribut ing talks presented at this conference. The two themes of convexity and global optimization pervade this book. The conference provided a forum for researchers working on different aspects of convexity and global opti mization to present their recent discoveries, and to interact with people working on complementary aspects of mathematical programming.

Efficient Stochastic Optimization Algorithms for Convex, Non-Convex Problems

Efficient Stochastic Optimization Algorithms for Convex, Non-Convex Problems
Author: Aysegul Beyza Bumin
Publisher:
Total Pages: 0
Release: 2023
Genre:
ISBN:


Download Efficient Stochastic Optimization Algorithms for Convex, Non-Convex Problems Book in PDF, Epub and Kindle

The main research interest presented is large-scale stochastic optimization, which is at the core of machine learning and data science. Most of the existing algorithms are based on stochastic gradient descent (SGD), a conceptually simple algorithm that works reasonably well in practice. However, it also suffers from a slow convergence rate and requires manual tuning for best performance. This research focuses on developing stochastic proximal algorithms and improving them further for specific applications in bioinformatics. The goal is to 1) make the per-iteration complexity as efficient as that of SGD and 2) provide much more robust convergence guarantees that rely on fewer assumptions such as smoothness, Lipschitz continuity, or even convexity. Upon successfully being conducted, it greatly benefits the field of optimization and machine learning.

Non-convex Optimization for Machine Learning

Non-convex Optimization for Machine Learning
Author: Prateek Jain
Publisher: Foundations and Trends in Machine Learning
Total Pages: 218
Release: 2017-12-04
Genre: Machine learning
ISBN: 9781680833683


Download Non-convex Optimization for Machine Learning Book in PDF, Epub and Kindle

Non-convex Optimization for Machine Learning takes an in-depth look at the basics of non-convex optimization with applications to machine learning. It introduces the rich literature in this area, as well as equips the reader with the tools and techniques needed to apply and analyze simple but powerful procedures for non-convex problems. Non-convex Optimization for Machine Learning is as self-contained as possible while not losing focus of the main topic of non-convex optimization techniques. The monograph initiates the discussion with entire chapters devoted to presenting a tutorial-like treatment of basic concepts in convex analysis and optimization, as well as their non-convex counterparts. The monograph concludes with a look at four interesting applications in the areas of machine learning and signal processing, and exploring how the non-convex optimization techniques introduced earlier can be used to solve these problems. The monograph also contains, for each of the topics discussed, exercises and figures designed to engage the reader, as well as extensive bibliographic notes pointing towards classical works and recent advances. Non-convex Optimization for Machine Learning can be used for a semester-length course on the basics of non-convex optimization with applications to machine learning. On the other hand, it is also possible to cherry pick individual portions, such the chapter on sparse recovery, or the EM algorithm, for inclusion in a broader course. Several courses such as those in machine learning, optimization, and signal processing may benefit from the inclusion of such topics.

Convex Optimization

Convex Optimization
Author: Sébastien Bubeck
Publisher: Foundations and Trends (R) in Machine Learning
Total Pages: 142
Release: 2015-11-12
Genre: Convex domains
ISBN: 9781601988607


Download Convex Optimization Book in PDF, Epub and Kindle

This monograph presents the main complexity theorems in convex optimization and their corresponding algorithms. It begins with the fundamental theory of black-box optimization and proceeds to guide the reader through recent advances in structural optimization and stochastic optimization. The presentation of black-box optimization, strongly influenced by the seminal book by Nesterov, includes the analysis of cutting plane methods, as well as (accelerated) gradient descent schemes. Special attention is also given to non-Euclidean settings (relevant algorithms include Frank-Wolfe, mirror descent, and dual averaging), and discussing their relevance in machine learning. The text provides a gentle introduction to structural optimization with FISTA (to optimize a sum of a smooth and a simple non-smooth term), saddle-point mirror prox (Nemirovski's alternative to Nesterov's smoothing), and a concise description of interior point methods. In stochastic optimization it discusses stochastic gradient descent, mini-batches, random coordinate descent, and sublinear algorithms. It also briefly touches upon convex relaxation of combinatorial problems and the use of randomness to round solutions, as well as random walks based methods.

Mathematical Optimization Theory and Operations Research

Mathematical Optimization Theory and Operations Research
Author: Michael Khachay
Publisher: Springer
Total Pages: 716
Release: 2019-06-12
Genre: Computers
ISBN: 3030226298


Download Mathematical Optimization Theory and Operations Research Book in PDF, Epub and Kindle

This book constitutes the proceedings of the 18th International Conference on Mathematical Optimization Theory and Operations Research, MOTOR 2019, held in Ekaterinburg, Russia, in July 2019. The 48 full papers presented in this volume were carefully reviewed and selected from 170 submissions. MOTOR 2019 is a successor of the well-known International and All-Russian conference series, which were organized in Ural, Siberia, and the Far East for a long time. The selected papers are organized in the following topical sections: mathematical programming; bi-level optimization; integer programming; combinatorial optimization; optimal control and approximation; data mining and computational geometry; games and mathematical economics.

Distributed Stochastic Optimization in Non-Differentiable and Non-Convex Environments

Distributed Stochastic Optimization in Non-Differentiable and Non-Convex Environments
Author: Stefan Vlaski
Publisher:
Total Pages: 284
Release: 2019
Genre:
ISBN:


Download Distributed Stochastic Optimization in Non-Differentiable and Non-Convex Environments Book in PDF, Epub and Kindle

The first part of this dissertation considers distributed learning problems over networked agents. The general objective of distributed adaptation and learning is the solution of global, stochastic optimization problems through localized interactions and without information about the statistical properties of the data. Regularization is a useful technique to encourage or enforce structural properties on the resulting solution, such as sparsity or constraints. A substantial number of regularizers are inherently non-smooth, while many cost functions are differentiable. We propose distributed and adaptive strategies that are able to minimize aggregate sums of objectives. In doing so, we exploit the structure of the individual objectives as sums of differentiable costs and non-differentiable regularizers. The resulting algorithms are adaptive in nature and able to continuously track drifts in the problem; their recursions, however, are subject to persistent perturbations arising from the stochastic nature of the gradient approximations and from disagreement across agents in the network. The presence of non-smooth, and potentially unbounded, regularizers enriches the dynamics of these recursions. We quantify the impact of this interplay and draw implications for steady-state performance as well as algorithm design and present applications in distributed machine learning and image reconstruction. There has also been increasing interest in understanding the behavior of gradient-descent algorithms in non-convex environments. In this work, we consider stochastic cost functions, where exact gradients are replaced by stochastic approximations and the resulting gradient noise persistently seeps into the dynamics of the algorithm. We establish that the diffusion learning algorithm continues to yield meaningful estimates in these more challenging, non-convex environments, in the sense that (a) despite the distributed implementation, individual agents cluster in a small region around the weighted network centroid in the mean-fourth sense, and (b) the network centroid inherits many properties of the centralized, stochastic gradient descent recursion, including the escape from strict saddle-points in time inversely proportional to the step-size and return of approximately second-order stationary points in a polynomial number of iterations. In the second part of the dissertation, we consider centralized learning problems over networked feature spaces. Rapidly growing capabilities to observe, collect and process ever increasing quantities of information, necessitate methods for identifying and exploiting structure in high-dimensional feature spaces. Networks, frequently referred to as graphs in this context, have emerged as a useful tool for modeling interrelations among different parts of a data set. We consider graph signals that evolve dynamically according to a heat diffusion process and are subject to persistent perturbations. The model is not limited to heat diffusion but can be applied to modeling other processes such as the evolution of interest over social networks and the movement of people in cities. We develop an online algorithm that is able to learn the underlying graph structure from observations of the signal evolution and derive expressions for its performance. The algorithm is adaptive in nature and able to respond to changes in the graph structure and the perturbation statistics. Furthermore, in order to incorporate prior structural knowledge to improve classification performance, we propose a BRAIN strategy for learning, which enhances the performance of traditional algorithms, such as logistic regression and SVM learners, by incorporating a graphical layer that tracks and learns in real-time the underlying correlation structure among feature subspaces. In this way, the algorithm is able to identify salient subspaces and their correlations, while simultaneously dampening the effect of irrelevant features.

Optimization for Machine Learning

Optimization for Machine Learning
Author: Suvrit Sra
Publisher: MIT Press
Total Pages: 509
Release: 2012
Genre: Computers
ISBN: 026201646X


Download Optimization for Machine Learning Book in PDF, Epub and Kindle

An up-to-date account of the interplay between optimization and machine learning, accessible to students and researchers in both communities. The interplay between optimization and machine learning is one of the most important developments in modern computational science. Optimization formulations and methods are proving to be vital in designing algorithms to extract essential knowledge from huge volumes of data. Machine learning, however, is not simply a consumer of optimization technology but a rapidly evolving field that is itself generating new optimization ideas. This book captures the state of the art of the interaction between optimization and machine learning in a way that is accessible to researchers in both fields. Optimization approaches have enjoyed prominence in machine learning because of their wide applicability and attractive theoretical properties. The increasing complexity, size, and variety of today's machine learning models call for the reassessment of existing assumptions. This book starts the process of reassessment. It describes the resurgence in novel contexts of established frameworks such as first-order methods, stochastic approximations, convex relaxations, interior-point methods, and proximal methods. It also devotes attention to newer themes such as regularized optimization, robust optimization, gradient and subgradient methods, splitting techniques, and second-order methods. Many of these techniques draw inspiration from other fields, including operations research, theoretical computer science, and subfields of optimization. The book will enrich the ongoing cross-fertilization between the machine learning community and these other fields, and within the broader optimization community.

Introduction to Online Convex Optimization, second edition

Introduction to Online Convex Optimization, second edition
Author: Elad Hazan
Publisher: MIT Press
Total Pages: 249
Release: 2022-09-06
Genre: Computers
ISBN: 0262046989


Download Introduction to Online Convex Optimization, second edition Book in PDF, Epub and Kindle

New edition of a graduate-level textbook on that focuses on online convex optimization, a machine learning framework that views optimization as a process. In many practical applications, the environment is so complex that it is not feasible to lay out a comprehensive theoretical model and use classical algorithmic theory and/or mathematical optimization. Introduction to Online Convex Optimization presents a robust machine learning approach that contains elements of mathematical optimization, game theory, and learning theory: an optimization method that learns from experience as more aspects of the problem are observed. This view of optimization as a process has led to some spectacular successes in modeling and systems that have become part of our daily lives. Based on the “Theoretical Machine Learning” course taught by the author at Princeton University, the second edition of this widely used graduate level text features: Thoroughly updated material throughout New chapters on boosting, adaptive regret, and approachability and expanded exposition on optimization Examples of applications, including prediction from expert advice, portfolio selection, matrix completion and recommendation systems, SVM training, offered throughout Exercises that guide students in completing parts of proofs