Arguments
- self
a 'tidyFit' R6 class.
- data
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).
Details
Hyperparameters:
ntree (number of trees)
mtry (number of variables randomly sampled at each split)
Important method arguments (passed to m
)
The function provides a wrapper for randomForest::randomForest
. See ?randomForest
for more details.
Implementation
The random forest is always fit with importance = TRUE
. The feature importance values are extracted using coef()
.
References
Liaw, A. and Wiener, M. (2002). Classification and Regression by randomForest. R News 2(3), 18--22.
See also
.fit.svm
, .fit.boost
and m
methods
Examples
# Load data
data <- tidyfit::Factor_Industry_Returns
data <- dplyr::filter(data, Industry == "HiTec")
data <- dplyr::select(data, -Date, -Industry)
# Stand-alone function
fit <- m("rf", Return ~ ., data)
fit
#> # A tibble: 1 × 5
#> estimator_fct `size (MB)` grid_id model_object settings
#> <chr> <dbl> <chr> <list> <list>
#> 1 randomForest::randomForest 8.41 #0010000 <tidyFit> <tibble>
# Within 'regress' function
fit <- regress(data, Return ~ ., m("rf"))
tidyr::unnest(coef(fit), model_info)
#> # A tibble: 6 × 6
#> # Groups: model [1]
#> model term estimate `%IncMSE` IncNodePurity importanceSD
#> <chr> <chr> <lgl> <dbl> <dbl> <dbl>
#> 1 rf Mkt-RF NA 38.8 14549. 0.429
#> 2 rf SMB NA 0.911 1966. 0.0903
#> 3 rf HML NA 3.76 3222. 0.162
#> 4 rf RMW NA 1.84 2531. 0.121
#> 5 rf CMA NA 5.46 4409. 0.236
#> 6 rf RF NA 0.584 1055. 0.0636