Medical Risk Prediction Models: With Ties to Machine Learning is a hands-on book for clinicians, epidemiologists, and professional statisticians who need to make or evaluate a statistical prediction model based on data. The subject of the book is the patient's individualized probability of a medical event within a given time horizon. Gerds and Kattan describe the mathematical details of making and evaluating a statistical prediction model in a highly pedagogical manner while avoiding mathematical notation. Read this book when you are in doubt about whether a Cox regression model predicts better than a random survival forest.
Features:
Thomas A. Gerds is a professor at the Biostatistics Unit at the University of Copenhagen and is affiliated with the Danish Heart Foundation. He is the author of several R-packages on CRAN and has taught statistics courses to non-statisticians for many years.
Michael W. Kattan is a highly cited author and Chair of the Department of Quantitative Health Sciences at Cleveland Clinic. He is a Fellow of the American Statistical Association and has received two awards from the Society for Medical Decision Making: the Eugene L. Saenger Award for Distinguished Service, and the John M. Eisenberg Award for Practical Application of Medical Decision-Making Research.
Thomas A. Gerds is professor at the biostatistics unit at the University of Copenhagen. He is affiliated with the Danish Heart Foundation. He is author of several R-packages on CRAN and has taught statistics courses to non-statisticians for many years.
Michael Kattan is a highly cited author and Chair of the Department of Quantitative Health Sciences at Cleveland Clinic. He is a Fellow of the American Statistical Association and has received two awards from the Society for Medical Decision Making: the Eugene L. Saenger Award for Distinguished Service, and the John M. Eisenberg Award for Practical Application of Medical Decision Making Research.
Why should I care about statistical prediction models?
The many uses of prediction models in medicine
The unique messages of this book
Prognostic factor modeling philosophy
The rest of this book
Prediction model framework
Target population
The time origin
The event of interest
The prediction time horizon and follow-up
Landmarking
Risks and risk predictions
Classification of risk
Predictor variables
Checklist
Prediction performance
Proper scoring rules
Calibration
Discrimination
Explained variation
Variability and uncertainty
The interpretation is relative
Utility
Average versus subgroups
Study design
Study design and sources of information
Cohort
Multi-center study
Randomized clinical trial
Case-control
Given treatment and treatment options
Sample size calculation
Data
Purpose dataset
Data dictionary
Measurement error
Missing values
Censored data
Competing risks
Modeling
Risk prediction model
Risk classifier
How is prediction modeling different from statistical inference?
Linear predictor
Expert selects the candidate predictors
How to select variables for inclusion in the final model
All possible interactions
Checklist
Machine learning
Validation
The conventional model
Internal and external validation
Conditional versus expected performance
Cross-validation
Data splitting
Bootstrap
Model checking and goodness of fit
Reproducibility
Pitfalls
Age as time scale
Odds ratios and hazard ratios are not predictions of risks
Do not blame the metric
Censored data versus competing risks
Disease-specific survival
Overfitting
Data-dependent decisions
Balancing data
Independent predictor
Automated variable selection
Definition of subjects
Choice of time scale
Pre-selection of predictor variables
Preparation of predictor variables
Categorical variables
Continuous variables
Derived predictor variables
Repeated measurements
Measurement error
Missing values
Preparation of event time outcome
Illustration without competing risks
Illustration with competing risks
Artificial censoring at the prediction time horizon
Specifying the model type
Uncensored binary outcome
Right-censored time-to-event outcome (no competing risks)
Right-censored time-to-event outcome with competing risks
Benchmark model
Uncensored binary outcome
Right-censored time-to-event outcome (without competing risks)
Right-censored time-to-event with competing risks
Including predictor variables
Categorical predictor variables
Continuous predictor variables
Interaction effects
Modeling strategy
Variable selection
Conventional model strategy
Whether to use a standard regression model or something else
Advanced topics
How to prevent overfitting the data
How to deal with missing values
How to deal with non-converging models
What you should put in your manuscript
Baseline tables
Follow Up tables
Regression tables
Risk plots
Nomograms
Deployment
Risk charts
Internet calculator
Cost-benefit analysis (waiting lists)
Model assessment roadmap
Visualization of the predictions
Calculation of model performance
Visualization of model performance
Uncensored binary outcome
Distribution of the predicted risks
Brier score
AUC
Calibration curves
Right-censored time-to-event outcome (without competing risks)
Distribution of the predicted risks
Brier score with censored data
Time-dependent AUC for censored data
Calibration curve for censored data
Competing risks
Distribution of the predicted risks
Brier score with competing risks
Time-dependent AUC for competing risks
Calibration curve for competing risks
The Index of Prediction Accuracy (IPA)
Choice of prediction time horizon
Time-dependent prediction performance
Model comparison roadmap
Analysis of rival prediction models
Uncensored binary outcome
Right-censored time-to-event outcome (without competing risks)
Competing risks
Clinically relevant change of prediction
Does a new marker improve prediction?
Many new predictors
Updating a subject's prediction
What would make me an expert?
Multiple cohorts / Multi-center studies
The role of treatment for making a prediction model
Modeling treatment
Comparative effectiveness tables
Learning curve paradigm
Internal validation (data splitting)
Single split
Calendar split
Multiple splits (cross-validation)
Dilemma of internal validation
The apparent and the + estimator
Tips and tricks
Missing values
Missing values in the learning data
Missing values in the validation data
Time-varying coefficient models
Time-varying predictor variables
Zero layers of cross-validation
What may happen if you do not look at the data
Unsupervised modeling steps
Final model
One layer of cross-validation
Penalized regression
Supervised spline selection
Machine learning (two levels of cross-validation)
Random forest
Deep learning and artificial neural networks
The super learner
Threshold selection for decision making
Number of events per variable
Confidence intervals for predicted probabilities
Models developed from case-control data
Hosmer-Lemeshow test
Backward elimination and stepwise selection
Rank correlation (c-index) for survival outcome
Integrated Brier score
Net reclassification index and the integrated discrimination improvement
Re-classification tables
Boxplots of rival models conditional on the outcome