Skip to content

Fits a random forest on a 'tidyFit' R6 class. The function can be used with regress and classify.

Usage

.model.rf(self, data = NULL)

Arguments

self

a 'tidyFit' R6 class.

data

a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).

Value

A fitted 'tidyFit' class model.

Details

Hyperparameters:

  • ntree (number of trees)

  • mtry (number of variables randomly sampled at each split)

Important method arguments (passed to m)

The function provides a wrapper for randomForest::randomForest. See ?randomForest for more details.

Implementation

The random forest is always fit with importance = TRUE. The feature importance values are extracted using coef().

References

Liaw, A. and Wiener, M. (2002). Classification and Regression by randomForest. R News 2(3), 18--22.

See also

Author

Johann Pfitzinger

Examples

# Load data
data <- tidyfit::Factor_Industry_Returns
data <- dplyr::filter(data, Industry == "HiTec")
data <- dplyr::select(data, -Date, -Industry)

# Stand-alone function
fit <- m("rf", Return ~ ., data)
fit
#> # A tibble: 1 × 5
#>   estimator_fct              `size (MB)` grid_id  model_object settings
#>   <chr>                            <dbl> <chr>    <list>       <list>  
#> 1 randomForest::randomForest        8.48 #0010000 <tidyFit>    <tibble>

# Within 'regress' function
fit <- regress(data, Return ~ ., m("rf"))
tidyr::unnest(coef(fit), model_info)
#> # A tibble: 6 × 6
#> # Groups:   model [1]
#>   model term   estimate `%IncMSE` IncNodePurity importanceSD
#>   <chr> <chr>  <lgl>        <dbl>         <dbl>        <dbl>
#> 1 rf    Mkt-RF NA          38.6          14872.       0.420 
#> 2 rf    SMB    NA           1.07          1964.       0.0957
#> 3 rf    HML    NA           3.62          3176.       0.159 
#> 4 rf    RMW    NA           2.01          2435.       0.123 
#> 5 rf    CMA    NA           5.22          4286.       0.256 
#> 6 rf    RF     NA           0.574         1013.       0.0757