Foundations and Trends® in Machine Learning > Vol 3 > Issue 2

Randomized Algorithms for Matrices and Data

By Michael W. Mahoney, Department of Mathematics, Stanford University, USA, mmahoney@cs.stanford.edu

 
Suggested Citation
Michael W. Mahoney (2011), "Randomized Algorithms for Matrices and Data", Foundations and TrendsĀ® in Machine Learning: Vol. 3: No. 2, pp 123-224. http://dx.doi.org/10.1561/2200000035

Publication Date: 22 Nov 2011
© 2011 M. W. Mahoney
 
Subjects
Data mining
 

Free Preview:

Download extract

Share

Download article
In this article:
1 Introduction 
2 Matrices in Large-scale Scientific Data Analysis 
3 Randomization Applied to Matrix Problems 
4 Randomized Algorithms for Least-squares Approximation 
5 Randomized Algorithms for Low-rank Matrix Approximation 
6 Empirical Observations 
7 A Few General Thoughts, and A Few Lessons Learned 
8 Conclusion 
Acknowledgments 
References 

Abstract

Randomized algorithms for very large matrix problems have received a great deal of attention in recent years. Much of this work was motivated by problems in large-scale data analysis, largely since matrices are popular structures with which to model data drawn from a wide range of application domains, and this work was performed by individuals from many different research communities. While the most obvious benefit of randomization is that it can lead to faster algorithms, either in worst-case asymptotic theory and/or numerical implementation, there are numerous other benefits that are at least as important. For example, the use of randomization can lead to simpler algorithms that are easier to analyze or reason about when applied in counterintuitive settings; it can lead to algorithms with more interpretable output, which is of interest in applications where analyst time rather than just computational time is of interest; it can lead implicitly to regularization and more robust output; and randomized algorithms can often be organized to exploit modern computational architectures better than classical numerical methods.

DOI:10.1561/2200000035
ISBN: 978-1-60198-506-4
110 pp. $80.00
Buy book (pb)
 
ISBN: 978-1-60198-507-1
110 pp. $100.00
Buy E-book (.pdf)
Table of contents:
1: Introduction
2: Matrices in large-scale scientific data analysis
3: Randomization applied to matrix problems
4: Randomized algorithms for least-squares approximation
5: Randomized algorithms for low-rank matrix approximation
6: Empirical observations
7: A few general thoughts, and a few lessons learned
8: Conclusion
Acknowledgements
References

Randomized Algorithms for Matrices and Data

Randomized algorithms for very large matrix problems have received a great deal of attention in recent years. Much of this work was motivated by problems in large-scale data analysis, largely since matrices are popular structures with which to model data drawn from a wide range of application domains, and the success of this line of work opens the possibility of performing matrix-based computations with truly massive data sets. Originating within theoretical computer science, this work was subsequently extended and applied in important ways by researchers from numerical linear algebra, statistics, applied mathematics, data analysis, and machine learning, as well as domain scientists. Randomized Algorithms for Matrices and Data provides a detailed overview, appropriate for both students and researchers from all of these areas, of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis. By focusing on ubiquitous and fundamental problems such as least-squares approximation and low-rank matrix approximation that have been at the center of recent developments, an emphasis is placed on a few simple core ideas that underlie not only recent theoretical advances but also the usefulness of these algorithmic tools in large-scale data applications.

 
MAL-035