This textbook considers statistical learning applications when interest centers on the conditional distribution of a response variable, given a set of predictors, and in the absence of a credible model that can be specified before the data analysis begins. Consistent with modern data analytics, it emphasizes that a proper statistical learning data analysis depends in an integrated fashion on sound data collection, intelligent data management, appropriate statistical procedures, and an accessible interpretation of results. The unifying theme is that supervised learning properly can be seen as a form of regression analysis. Key concepts and procedures are illustrated with a large number of real applications and their associated code in R, with an eye toward practical implications. The growing integration of computer science and statistics is well represented including the occasional, but salient, tensions that result. Throughout, there are links to the big picture.
The third edition considers significant advances in recent years, among which are:
the development of overarching, conceptual frameworks for statistical learning;
the impact of ¿big datä on statistical learning;
the nature and consequences of post-model selection statistical inference;
deep learning in various forms;
the special challenges to statistical inference posed by statistical learning;
the fundamental connections between data collection and data analysis;
interdisciplinary ethical and political issues surrounding the application of algorithmic methods in a wide variety of fields, each linked to concerns about transparency, fairness, and accuracy.
This edition features new sections on accuracy, transparency, and fairness, as well as a new chapter on deep learning. Precursors to deep learning get an expanded treatment. The connections between fitting and forecasting are considered in greater depth. Discussion of the estimation targets for algorithmic methods is revised and expanded throughout to reflect the latest research. Resampling procedures are emphasized. The material is written for upper undergraduate and graduate students in the social, psychological and life sciences and for researchers who want to apply statistical learning procedures to scientific and policy problems.
Richard Berk is Distinguished Professor of Statistics Emeritus at UCLA and currently a Professor at the University of Pennsylvania in the Department of Statistics and in the Department of Criminology. He is an elected fellow of the American Statistical Association and the American Association for the Advancement of Science and has served in a professional capacity with a number of organizations such as the Committee on Applied and Theoretical Statistics for the National Research Council and the Board of Directors of the Social Science Research Council. His research has ranged across a variety of statistical applications in the social and natural sciences.
Preface.- Preface To Second Edition.- Preface To Third Edition.- 1 Statistical Learning as a Regression Problem.- 2 Splines, Smoothers, and Kernels.- 3 Classification and Regression Trees (CART).- 4 Bagging.- 5 Random Forests.- 6 Boosting.- 7 Support Vector Machines.- 8 Neural Networks.- 9 Reinforcement Learning and Genetic Algorithms.- 10 Integration Themes and a Bit of Craft Lore.- Index.