Arguments
- self
a 'tidyFit' R6 class.
- data
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).
Details
Hyperparameters:
ntree (number of trees)
mtry (number of variables randomly sampled at each split)
Important method arguments (passed to m
)
The function provides a wrapper for randomForest::randomForest
. See ?randomForest
for more details.
Implementation
The random forest is always fit with importance = TRUE
. The feature importance values are extracted using coef()
.
References
Liaw, A. and Wiener, M. (2002). Classification and Regression by randomForest. R News 2(3), 18--22.
See also
.model.svm
, .model.boost
and m
methods
Examples
# Load data
data <- tidyfit::Factor_Industry_Returns
data <- dplyr::filter(data, Industry == "HiTec")
data <- dplyr::select(data, -Date, -Industry)
# Stand-alone function
fit <- m("rf", Return ~ ., data)
fit
#> # A tibble: 1 × 5
#> estimator_fct `size (MB)` grid_id model_object settings
#> <chr> <dbl> <chr> <list> <list>
#> 1 randomForest::randomForest 8.41 #0010000 <tidyFit> <tibble>
# Within 'regress' function
fit <- regress(data, Return ~ ., m("rf"))
tidyr::unnest(coef(fit), model_info)
#> # A tibble: 6 × 6
#> # Groups: model [1]
#> model term estimate `%IncMSE` IncNodePurity importanceSD
#> <chr> <chr> <lgl> <dbl> <dbl> <dbl>
#> 1 rf Mkt-RF NA 38.9 14969. 0.406
#> 2 rf SMB NA 1.09 1989. 0.0945
#> 3 rf HML NA 3.71 3019. 0.166
#> 4 rf RMW NA 2.03 2427. 0.126
#> 5 rf CMA NA 5.30 4406. 0.226
#> 6 rf RF NA 0.498 1007. 0.0705