Chapter 4: Penalized Linear Regression

Why Penalized Linear Regression Methods Are So Useful (p.122)

Extremely Fast Coefficient Estimation

Variable Importance Information

Extremely Fast Evaluation When Deployed

  • Reliable Performance
  • Sparse Solutions
  • Problem May Require Linear Model
  • When to Use Ensemble Methods

Problem May Require Linear Model

  • a linear model might be a requirement of the solution, e.g.
  • Calculations of insurance payouts
  • Drug testing [...] regulatory apparatus requires a linear form for statistical inference

When to Use Ensemble Methods

  • you might get better performance with another technique, such as an ensemble method
  • ensemble methods for measuring variable importance can yield more information about the relationship between attributes and predic- tions ... second-order (and higher) information about what pairs of variables are more important together

Penalized Linear Regression: Regulating Linear Regression for Optimum Performance (p.124)

Training Linear Models: Minimizing Errors and More

  • Adding a Coefficient Penalty to the OLS Formulation
  • Other Useful Coefficient Penalties—Manhattan and ElasticNet
  • Why Lasso Penalty Leads to Sparse Coefficient Vectors
  • ElasticNet Penalty Includes Both Lasso and Ridge

Solving the Penalized Linear Regression Problem (p.132)

Understanding Least Angle Regression and Its Relationship to Forward Stepwise Regression

How LARS Generates Hundreds of Models of Varying Complexity

Choosing the Best Model from The Hundreds LARS Generates

  • Mechanizing Cross-Validation for Model Selection in Python Code
  • Accumulating Errors on Each Cross-Validation Fold and Evaluating Results
  • Practical Considerations with Model Selection and Training Sequence

CODE

  • Listing 4-1: LARS Algorithm for Predicting Wine Taste—larsWine2.py
    • Figure 4-3: Coefficient curves for LARS regression on wine data.
  • Listing 4-2: 10-Fold Cross-Validation to Determine Best Set of Coefficients—larsWineCV.py
    • Figure 4-4: Cross-validated mean square error for LARS on wine data.

Using Glmnet: Very Fast and Very General (p.144)

Comparison of the Mechanics of Glmnet and LARS Algorithms

Initializing and Iterating the Glmnet Algorithm

CODE

  • Listing 4-3: Glmnet Algorithm—glmnetWine.py
    • Figure 4-6: Coefficient curves for glmnet models for predicting wine taste

Hausaufgabe: Nachbereiten Bowles Kap.4, S.122-150.

Extensions to Linear Regression with Numeric Input (p.151)

Solving Classification Problems with Penalized Regression

CODE

  • Listing 4-4: Converting a Classification Problem to an Ordinary Regression Problem by Assigning Numeric Values to Binary Labels
    • Figure 4-7: Coefficient curves for rocks versus mines classification problem solved by converting to labels

Working with Classification Problems Having More Than Two Outcomes (p.155)

Ergänzungen:

Understanding Basis Expansion: Using Linear Methods on Nonlinear Problems (p.156)

CODE

  • Listing 4-5: Basis Expansion for Wine Taste Prediction
    • Figure 4-8: Functions generated to expand wine attribute session

Incorporating Non-Numeric Attributes into Linear Methods (p.158)

CODE

  • Listing 4-6: Coding Categorical Variable for Penalized Linear Regression - Abalone Data—larsAbalone.py