High Dimensional Data Analysis with Dependency and Under Limited Memory

High Dimensional Data Analysis with Dependency and Under Limited Memory
Author: Yiming Sun
Publisher:
Total Pages: 213
Release: 2019
Genre:
ISBN:


Download High Dimensional Data Analysis with Dependency and Under Limited Memory Book in PDF, Epub and Kindle

Several methods for high dimensional analysis are proposed in this thesis under the condition that there are data dependency and limited memory. The first part of the work proposes a model free method for building networks for time series data when data are dependent from weakly stationary time series. We develop a thresholding based on methods to estimate multivariate spectral density under weakly sparsity assumption for high dimensional time series. Our theoretical analysis ensures that consistent estimations of spectral density matrix of a p-dimensional time series using n samplesare possible under high-dimensional regime $\log p/n \rightarrow 0$ as long as the true spectral density is approximately sparse. A key technical component of our analysis is a new concentration inequality of average periodogram around its expectation, which is of independent interest. Our estimation consistency results complement existing results for shrinkage based estimators of multivariate spectral density, which require no assumption on sparsity but only ensure consistent estimation in a regime p^2/n --> 0. In addition, our proposed thresholding based estimators perform consistent and automatic edge selection when coherence networks among the components of a multivariate time series are learned. We demonstrate the advantages of our estimators using simulation studies and a real data application on functional connectivity analysis with fMRI data. We further show that with a simple modification in the classic estimator, we can build a rigorous theory for adaptive thresholding in estimating multivariate spectral density for Gaussian process. This adaptive estimator can capture the heterogeneity across different positions in spectral density matrix at a better convergence rate in comparison to the hard thresholding estimator. The second part delves into compressing/analyzing high dimensional data with limited memory. We fixate on developing a streaming algorithm for Tucker Decomposition, generalization of singular value decomposition. The method applies a randomized linear map to the tensor to obtain a sketch that captures the important directions within each mode as well as the interactions among the modes. The sketch can be extracted from streaming or distributed data or with a single pass over the tensor which uses storage proportional to the degrees of freedom in the output Tucker approximation. Although the algorithm can exploit another view to compute a superior approximation, it does not require a second pass over the tensor. In conclusion, the paper provides a rigorous theoretical guarantee on elimination of the approximation error. Extensive numerical experiments show that the algorithm produces useful results that improve the state of the art for streaming Tucker decomposition. Along the development of one-pass Tucker decomposition, we propose a memory efficient random mapping which we call Tensor random projection. We further study its theoretical property in application to several areas like random projection, sketching algorithms for fast computation for tensor regression.

High-Dimensional Data Analysis with Low-Dimensional Models

High-Dimensional Data Analysis with Low-Dimensional Models
Author: John Wright
Publisher: Cambridge University Press
Total Pages: 717
Release: 2022-01-13
Genre: Computers
ISBN: 1108489737


Download High-Dimensional Data Analysis with Low-Dimensional Models Book in PDF, Epub and Kindle

Connects fundamental mathematical theory with real-world problems, through efficient and scalable optimization algorithms.

Latent Factor Analysis for High-dimensional and Sparse Matrices

Latent Factor Analysis for High-dimensional and Sparse Matrices
Author: Ye Yuan
Publisher: Springer Nature
Total Pages: 99
Release: 2022-11-15
Genre: Computers
ISBN: 9811967032


Download Latent Factor Analysis for High-dimensional and Sparse Matrices Book in PDF, Epub and Kindle

Latent factor analysis models are an effective type of machine learning model for addressing high-dimensional and sparse matrices, which are encountered in many big-data-related industrial applications. The performance of a latent factor analysis model relies heavily on appropriate hyper-parameters. However, most hyper-parameters are data-dependent, and using grid-search to tune these hyper-parameters is truly laborious and expensive in computational terms. Hence, how to achieve efficient hyper-parameter adaptation for latent factor analysis models has become a significant question. This is the first book to focus on how particle swarm optimization can be incorporated into latent factor analysis for efficient hyper-parameter adaptation, an approach that offers high scalability in real-world industrial applications. The book will help students, researchers and engineers fully understand the basic methodologies of hyper-parameter adaptation via particle swarm optimization in latent factor analysis models. Further, it will enable them to conduct extensive research and experiments on the real-world applications of the content discussed.

Introduction to High-Dimensional Statistics

Introduction to High-Dimensional Statistics
Author: Christophe Giraud
Publisher: CRC Press
Total Pages: 410
Release: 2021-08-25
Genre: Computers
ISBN: 1000408353


Download Introduction to High-Dimensional Statistics Book in PDF, Epub and Kindle

Praise for the first edition: "[This book] succeeds singularly at providing a structured introduction to this active field of research. ... it is arguably the most accessible overview yet published of the mathematical ideas and principles that one needs to master to enter the field of high-dimensional statistics. ... recommended to anyone interested in the main results of current research in high-dimensional statistics as well as anyone interested in acquiring the core mathematical skills to enter this area of research." —Journal of the American Statistical Association Introduction to High-Dimensional Statistics, Second Edition preserves the philosophy of the first edition: to be a concise guide for students and researchers discovering the area and interested in the mathematics involved. The main concepts and ideas are presented in simple settings, avoiding thereby unessential technicalities. High-dimensional statistics is a fast-evolving field, and much progress has been made on a large variety of topics, providing new insights and methods. Offering a succinct presentation of the mathematical foundations of high-dimensional statistics, this new edition: Offers revised chapters from the previous edition, with the inclusion of many additional materials on some important topics, including compress sensing, estimation with convex constraints, the slope estimator, simultaneously low-rank and row-sparse linear regression, or aggregation of a continuous set of estimators. Introduces three new chapters on iterative algorithms, clustering, and minimax lower bounds. Provides enhanced appendices, minimax lower-bounds mainly with the addition of the Davis-Kahan perturbation bound and of two simple versions of the Hanson-Wright concentration inequality. Covers cutting-edge statistical methods including model selection, sparsity and the Lasso, iterative hard thresholding, aggregation, support vector machines, and learning theory. Provides detailed exercises at the end of every chapter with collaborative solutions on a wiki site. Illustrates concepts with simple but clear practical examples.

Handbook of Bayesian, Fiducial, and Frequentist Inference

Handbook of Bayesian, Fiducial, and Frequentist Inference
Author: James Berger
Publisher: CRC Press
Total Pages: 421
Release: 2024-02-26
Genre: Mathematics
ISBN: 1003837646


Download Handbook of Bayesian, Fiducial, and Frequentist Inference Book in PDF, Epub and Kindle

The emergence of data science, in recent decades, has magnified the need for efficient methodology for analyzing data and highlighted the importance of statistical inference. Despite the tremendous progress that has been made, statistical science is still a young discipline and continues to have several different and competing paths in its approaches and its foundations. While the emergence of competing approaches is a natural progression of any scientific discipline, differences in the foundations of statistical inference can sometimes lead to different interpretations and conclusions from the same dataset. The increased interest in the foundations of statistical inference has led to many publications, and recent vibrant research activities in statistics, applied mathematics, philosophy and other fields of science reflect the importance of this development. The BFF approaches not only bridge foundations and scientific learning, but also facilitate objective and replicable scientific research, and provide scalable computing methodologies for the analysis of big data. Most of the published work typically focusses on a single topic or theme, and the body of work is scattered in different journals. This handbook provides a comprehensive introduction and broad overview of the key developments in the BFF schools of inference. It is intended for researchers and students who wish for an overview of foundations of inference from the BFF perspective and provides a general reference for BFF inference. Key Features: Provides a comprehensive introduction to the key developments in the BFF schools of inference Gives an overview of modern inferential methods, allowing scientists in other fields to expand their knowledge Is accessible for readers with different perspectives and backgrounds

Frontiers in Massive Data Analysis

Frontiers in Massive Data Analysis
Author: National Research Council
Publisher: National Academies Press
Total Pages: 191
Release: 2013-09-03
Genre: Mathematics
ISBN: 0309287812


Download Frontiers in Massive Data Analysis Book in PDF, Epub and Kindle

Data mining of massive data sets is transforming the way we think about crisis response, marketing, entertainment, cybersecurity and national intelligence. Collections of documents, images, videos, and networks are being thought of not merely as bit strings to be stored, indexed, and retrieved, but as potential sources of discovery and knowledge, requiring sophisticated analysis techniques that go far beyond classical indexing and keyword counting, aiming to find relational and semantic interpretations of the phenomena underlying the data. Frontiers in Massive Data Analysis examines the frontier of analyzing massive amounts of data, whether in a static database or streaming through a system. Data at that scale-terabytes and petabytes-is increasingly common in science (e.g., particle physics, remote sensing, genomics), Internet commerce, business analytics, national security, communications, and elsewhere. The tools that work to infer knowledge from data at smaller scales do not necessarily work, or work well, at such massive scale. New tools, skills, and approaches are necessary, and this report identifies many of them, plus promising research directions to explore. Frontiers in Massive Data Analysis discusses pitfalls in trying to infer knowledge from massive data, and it characterizes seven major classes of computation that are common in the analysis of massive data. Overall, this report illustrates the cross-disciplinary knowledge-from computer science, statistics, machine learning, and application disciplines-that must be brought to bear to make useful inferences from massive data.

Advanced Visual Interfaces. Supporting Big Data Applications

Advanced Visual Interfaces. Supporting Big Data Applications
Author: Marco X. Bornschlegl
Publisher: Springer
Total Pages: 154
Release: 2016-12-15
Genre: Computers
ISBN: 3319500708


Download Advanced Visual Interfaces. Supporting Big Data Applications Book in PDF, Epub and Kindle

This book constitutes the thoroughly refereed post-workshop proceedings of the AVI 2016 Workshop on Road Mapping Infrastructures for Advanced Visual Interfaces Supporting Big Data Applications in Virtual Research Environments, AVI-BDA 2016, held in Bari, Italy, in June 2016. The 10 revised full papers in this volume present the elaborated outcome of the initial position papers capturing the results oft the roadmapping discussions in the workshop at which comments of several external reviewers for these full publications were also integrated.

Large Sample Covariance Matrices and High-Dimensional Data Analysis

Large Sample Covariance Matrices and High-Dimensional Data Analysis
Author: Jianfeng Yao
Publisher: Cambridge University Press
Total Pages: 0
Release: 2015-03-26
Genre: Mathematics
ISBN: 9781107065178


Download Large Sample Covariance Matrices and High-Dimensional Data Analysis Book in PDF, Epub and Kindle

High-dimensional data appear in many fields, and their analysis has become increasingly important in modern statistics. However, it has long been observed that several well-known methods in multivariate analysis become inefficient, or even misleading, when the data dimension p is larger than, say, several tens. A seminal example is the well-known inefficiency of Hotelling's T2-test in such cases. This example shows that classical large sample limits may no longer hold for high-dimensional data; statisticians must seek new limiting theorems in these instances. Thus, the theory of random matrices (RMT) serves as a much-needed and welcome alternative framework. Based on the authors' own research, this book provides a first-hand introduction to new high-dimensional statistical methods derived from RMT. The book begins with a detailed introduction to useful tools from RMT, and then presents a series of high-dimensional problems with solutions provided by RMT methods.