Best subset regression and classification for tidyfit

Fits a best subset regression or classification on a 'tidyFit' R6 class. The function can be used with regress and classify.

# S3 method for class 'subset'
.fit(self, data = NULL)

Arguments

self: a 'tidyFit' R6 class.
data: a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).

Value

A fitted 'tidyFit' class model.

Details

Hyperparameters:

None. Cross validation not applicable.

Important method arguments (passed to m)

method (e.g. 'forward', 'backward')
IC (information criterion, e.g. 'AIC')

The best subset regression is estimated using bestglm::bestglm which is a wrapper around leaps::regsubsets for the regression case, and performs an exhaustive search for the classification case. See ?bestglm for more details.

Implementation

Forward or backward selection can be performed by passing method = "forward" or method = "backward" to m.

References

A.I. McLeod, Changjiang Xu and Yuanhao Lai (2020). bestglm: Best Subset GLM and Regression Utilities. R package version 0.37.3. URL https://CRAN.R-project.org/package=bestglm.

Author

Johann Pfitzinger

Examples

# Load data
data <- tidyfit::Factor_Industry_Returns

# Stand-alone function
fit <- m("subset", Return ~ ., data, method = c("forward", "backward"))
tidyr::unnest(fit, settings)
#> # A tibble: 2 × 6
#>   estimator_fct    `size (MB)` grid_id  model_object method   warnings          
#>   <chr>                  <dbl> <chr>    <list>       <chr>    <chr>             
#> 1 bestglm::bestglm        2.74 #0010000 <tidyFit>    forward  NA                
#> 2 bestglm::bestglm        2.74 #0020000 <tidyFit>    backward model with initia…

# Within 'regress' function
fit <- regress(data, Return ~ ., m("subset", method = "forward"),
               .mask = c("Date", "Industry"))
coef(fit)
#> # A tibble: 6 × 4
#> # Groups:   model [1]
#>   model  term        estimate model_info      
#>   <chr>  <chr>          <dbl> <list>          
#> 1 subset (Intercept)  -0.0243 <tibble [1 × 3]>
#> 2 subset Mkt-RF        0.979  <tibble [1 × 3]>
#> 3 subset HML           0.0628 <tibble [1 × 3]>
#> 4 subset RMW           0.156  <tibble [1 × 3]>
#> 5 subset CMA           0.114  <tibble [1 × 3]>
#> 6 subset RF            0.997  <tibble [1 × 3]>