Tidy Modeling with R
Hello World
1
Software for modeling
1.1
Types of models
Descriptive models
Inferential models
Predictive models
1.2
Some terminology
1.3
How does modeling fit into the data analysis process?
1.4
Chapter summary
2
A tidyverse primer
2.1
Principles
2.1.1
Design for humans
2.1.2
Reuse existing data structures
2.1.3
Design for the pipe and functional programming
2.2
Examples of tidyverse syntax
2.3
Chapter summary
3
A review of R modeling fundamentals
3.1
An example
3.2
What does the R formula do?
3.3
Why tidiness is important for modeling
3.4
Combining base R models and the tidyverse
3.5
Chapter summary
Basics
4
The Ames housing data
4.1
Exploring important features
4.2
Chapter summary
5
Spending our data
5.1
Common methods for splitting data
5.2
What proportion should be used?
5.3
What about a validation set?
5.4
Multi-level data
5.5
Other considerations
5.6
Chapter summary
6
Feature engineering with recipes
6.1
A simple recipe for the Ames housing data
6.2
Using recipes
6.3
Encoding qualitative data in a numeric format
6.4
Interaction terms
6.5
Skipping steps for new data
6.6
Other examples of recipe steps
Spline functions
Feature extraction
Row sampling steps
General transformations
Natural language processing
6.7
How data are used by the recipe
6.8
Using a recipe with traditional modeling functions
6.9
Tidy a recipe
6.10
Column roles
6.11
Chapter summary
7
Fitting models with parsnip
7.1
Create a model
7.2
Use the model results
7.3
Make predictions
7.4
parsnip-adjacent packages
7.5
Chapter summary
8
A model workflow
8.1
Where does the model begin and end?
8.2
Workflow basics
8.3
Workflows and recipes
8.4
How does a workflow use the formula?
Tree-based models
Special formulas and in-line functions
8.5
Future plans
8.6
Chapter summary
9
Judging model effectiveness
9.1
Performance metrics and inference
9.2
Regression metrics
9.3
Binary classification metrics
9.4
Multi-class classification metrics
9.5
Chapter summary
Tools for Creating Effective Models
10
Resampling for evaluating performance
10.1
The resubstitution approach
10.2
Resampling methods
10.2.1
Cross-validation
Repeated cross-validation
Leave-one-out cross-validation
Monte Carlo
cross-validation
10.2.2
Validation sets
10.2.3
Bootstrapping
10.2.4
Rolling forecasting origin resampling
10.3
Estimating performance
10.4
Parallel processing
10.5
Saving the resampled objects
10.6
Chapter summary
11
Comparing models with resampling
11.1
Resampled performance statistics
11.2
Simple hypothesis testing methods
11.3
Bayesian methods
11.4
Chapter summary
12
Model tuning and the dangers of overfitting
12.1
Tuning parameters for different types of models
12.2
What do we optimize?
12.3
The consequences of poor parameter estimates
12.4
Two general strategies for optimization
12.5
Tuning parameters in tidymodels
12.6
Chapter summary
13
Grid search
13.1
Regular and non-regular grids
13.2
Evaluating the grid
13.3
Finalizing the model
13.4
Tools for efficient grid search
14
Iterative search
14.1
Simulated annealing
14.2
Bayesian optimization
15
Explaining models and predictions
Appendix
A
Recommended preprocessing
REFERENCES
Tidy Modeling with R
13
Grid search
13.1
Regular and non-regular grids
13.2
Evaluating the grid
13.3
Finalizing the model
13.4
Tools for efficient grid search
parallelism, sub-models, and racing