By Ruidi Chen, Boston University, USA, rchen15@bu.edu | Ioannis Ch. Paschalidis, Boston University, USA, yannisp@bu.edu
This monograph develops a comprehensive statistical learning framework that is robust to (distributional) perturbations in the data using Distributionally Robust Optimization (DRO) under the Wasserstein metric. Beginning with fundamental properties of the Wasserstein metric and the DRO formulation, we explore duality to arrive at tractable formulations and develop finite-sample, as well as asymptotic, performance guarantees. We consider a series of learning problems, including (i) distributionally robust linear regression; (ii) distributionally robust regression with group structure in the predictors; (iii) distributionally robust multi-output regression and multiclass classification, (iv) optimal decision making that combines distributionally robust regression with nearest-neighbor estimation; (v) distributionally robust semi-supervised learning, and (vi) distributionally robust reinforcement learning. A tractable DRO relaxation for each problem is being derived, establishing a connection between robustness and regularization, and obtaining bounds on the prediction and estimation errors of the solution. Beyond theory, we include numerical experiments and case studies using synthetic and real data. The real data experiments are all associated with various health informatics problems, an application area which provided the initial impetus for this work.
Many of the modern techniques to solve supervised learning problems suffer from a lack of interpretability and analyzability that do not give rise to rigorous mathematical results. This monograph develops a comprehensive statistical learning framework that uses Distributionally Robust Optimization (DRO) under the Wasserstein metric to ensure robustness to perturbations in the data.
The authors introduce the reader to the fundamental properties of the Wasserstein metric and the DRO formulation, before explaining the theory in detail and its application. They cover a series of learning problems, including (i) distributionally robust linear regression; (ii) distributionally robust regression with group structure in the predictors; (iii) distributionally robust multi-output regression and multiclass classification; (iv) optimal decision making that combines distributionally robust regression with nearest-neighbor estimation; (v) distributionally robust semi-supervised learning; (vi) distributionally robust reinforcement learning. Throughout the monograph, the authors use applications in medicine and health care to illustrate the theoretical ideas in practice. They include numerical experiments and case studies using synthetic and real data.
Distributionally Robust Learning provides a detailed insight into a technique that has gained a lot of recent interest in developing robust supervised learning solutions that are founded in sound mathematical principles. It will be enlightening for researchers, practitioners and students working on the optimization of machine learning systems.