Setup hyperparameters for Ranger Random Forest training.
Usage
setup_Ranger(
num_trees = 500,
mtry = NULL,
importance = "impurity",
write_forest = TRUE,
probability = FALSE,
min_node_size = NULL,
min_bucket = NULL,
max_depth = NULL,
replace = TRUE,
sample_fraction = ifelse(replace, 1, 0.632),
case_weights = NULL,
class_weights = NULL,
splitrule = NULL,
num_random_splits = 1,
alpha = 0.5,
minprop = 0.1,
poisson_tau = 1,
split_select_weights = NULL,
always_split_variables = NULL,
respect_unordered_factors = NULL,
scale_permutation_importance = FALSE,
local_importance = FALSE,
regularization_factor = 1,
regularization_usedepth = FALSE,
keep_inbag = FALSE,
inbag = NULL,
holdout = FALSE,
quantreg = FALSE,
time_interest = NULL,
oob_error = TRUE,
save_memory = FALSE,
verbose = TRUE,
node_stats = FALSE,
seed = NULL,
na_action = "na.learn",
ifw = FALSE
)Arguments
- num_trees
(Tunable) Positive integer: Number of trees.
- mtry
(Tunable) Positive integer: Number of features to consider at each split.
- importance
Character: Variable importance mode. "none", "impurity", "impurity_corrected", "permutation". The "impurity" measure is the Gini index for classification, the variance of the responses for regression.
- write_forest
Logical: Save ranger.forest object, required for prediction. Set to FALSE to reduce memory usage if no prediction intended.
- probability
Logical: Grow a probability forest as in Malley et al. (2012). For classification only.
- min_node_size
(Tunable) Positive integer: Minimal node size. Default 1 for classification, 5 for regression, 3 for survival, and 10 for probability.
- min_bucket
Positive integer: Minimal number of samples in a terminal node. Only for survival. Deprecated in favor of min_node_size.
- max_depth
(Tunable) Positive integer: Maximal tree depth. A value of NULL or 0 (the default) corresponds to unlimited depth, 1 to tree stumps (1 split per tree).
- replace
Logical: Sample with replacement.
- sample_fraction
(Tunable) Numeric: Fraction of observations to sample. Default is 1 for sampling with replacement and 0.632 for sampling without replacement.
- case_weights
Numeric vector: Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.
- class_weights
Numeric vector: Weights for the outcome classes for classification. Vector of the same length as the number of classes, with names corresponding to the class labels.
- splitrule
(Tunable) Character: Splitting rule. For classification: "gini", "extratrees", "hellinger". For regression: "variance", "extratrees", "maxstat", "beta". For survival: "logrank", "extratrees", "C", "maxstat".
- num_random_splits
(Tunable) Positive integer: For "extratrees" splitrule: Number of random splits to consider for each candidate splitting variable.
- alpha
(Tunable) Numeric: For "maxstat" splitrule: significance threshold to allow splitting.
- minprop
(Tunable) Numeric: For "maxstat" splitrule: lower quantile of covariate distribution to be considered for splitting.
- poisson_tau
Numeric: For "poisson" regression splitrule: tau parameter for Poisson regression.
- split_select_weights
Numeric vector: Numeric vector with weights between 0 and 1, representing the probability to select variables for splitting. Alternatively, a list of size num_trees, with one weight vector per tree.
- always_split_variables
Character vector: Character vector with variable names to be always selected in addition to the mtry variables tried for splitting.
- respect_unordered_factors
Character or logical: Handling of unordered factor covariates. For "partition" all 2^(k-1)-1 possible partitions are considered for splitting, where k is the number of factor levels. For "ignore", all factor levels are ordered by their first occurrence in the data. For "order", all factor levels are ordered by their average response. TRUE corresponds to "partition" for the randomForest package compatibility.
- scale_permutation_importance
Logical: Scale permutation importance by standard error as in (Breiman 2001). Only applicable if permutation variable importance mode selected.
- local_importance
Logical: For permutation variable importance, use local importance as in Breiman (2001) and Liaw & Wiener (2002).
- regularization_factor
(Tunable) Numeric: Regularization factor. Penalize variables with many split points. Requires splitrule = "variance".
- regularization_usedepth
Logical: Use regularization factor with node depth. Requires regularization_factor.
- keep_inbag
Logical: Save how often observations are in-bag in each tree. These will be used for (local) variable importance if inbag.counts in predict() is NULL.
- inbag
List: Manually set observations per tree. List of size num_trees, containing inbag counts for each observation. Can be used for stratified sampling.
- holdout
Logical: Hold-out mode. Hold-out all samples with case weight 0 and use these for variable importance and prediction error.
- quantreg
Logical: Prepare quantile prediction as in quantile regression forests (Meinshausen 2006). For regression only. Set keep_inbag = TRUE to prepare out-of-bag quantile prediction.
- time_interest
Numeric: For GWAS data: SNP with this number will be used as time variable. Only for survival. Deprecated, use time.var in formula instead.
- oob_error
Logical: Compute OOB prediction error. Set to FALSE to save computation time if only the forest is needed.
- save_memory
Logical: Use memory saving (but slower) splitting mode. No effect for survival and GWAS data. Warning: This option slows down the tree growing, use only if you encounter memory problems.
- verbose
Logical: Show computation status and estimated runtime.
- node_stats
Logical: Save additional node statistics. Only terminal nodes for now.
- seed
Positive integer: Random seed. Default is NULL, which generates the seed from R. Set to 0 to ignore the R seed.
- na_action
Character: Action to take if the data contains missing values. "na.learn" uses observations with missing values in splitting, treating missing values as a separate category.
- ifw
Logical: Inverse Frequency Weighting for classification. If TRUE, class weights are set inversely proportional to the class frequencies.
Details
Get more information from ranger::ranger.
Examples
ranger_hyperparams <- setup_Ranger(num_trees = 1000L, ifw = FALSE)
ranger_hyperparams
#> <RangerHyperparameters>
#> hyperparameters:
#> num_trees: <int> 1000
#> mtry: <NUL> NULL
#> importance: <chr> impurity
#> write_forest: <lgc> TRUE
#> probability: <lgc> FALSE
#> min_node_size: <NUL> NULL
#> min_bucket: <NUL> NULL
#> max_depth: <NUL> NULL
#> replace: <lgc> TRUE
#> sample_fraction: <nmr> 1.00
#> case_weights: <NUL> NULL
#> class_weights: <NUL> NULL
#> splitrule: <NUL> NULL
#> num_random_splits: <int> 1
#> alpha: <nmr> 0.50
#> minprop: <nmr> 0.10
#> poisson_tau: <nmr> 1.00
#> split_select_weights: <NUL> NULL
#> always_split_variables: <NUL> NULL
#> respect_unordered_factors: <NUL> NULL
#> scale_permutation_importance: <lgc> FALSE
#> local_importance: <lgc> FALSE
#> regularization_factor: <nmr> 1.00
#> regularization_usedepth: <lgc> FALSE
#> keep_inbag: <lgc> FALSE
#> inbag: <NUL> NULL
#> holdout: <lgc> FALSE
#> quantreg: <lgc> FALSE
#> time_interest: <NUL> NULL
#> oob_error: <lgc> TRUE
#> save_memory: <lgc> FALSE
#> verbose: <lgc> TRUE
#> node_stats: <lgc> FALSE
#> seed: <NUL> NULL
#> na_action: <chr> na.learn
#> ifw: <lgc> FALSE
#> tunable_hyperparameters: <chr> num_trees, mtry, min_node_size, max_depth, sample_fraction, replace, splitrule, num_random_splits, alpha, minprop, regularization_factor, ifw
#> fixed_hyperparameters: <chr> importance, write_forest, probability, min_bucket, case_weights, class_weights, poisson_tau, split_select_weights, always_split_variables, respect_unordered_factors, scale_permutation_importance, local_importance, regularization_usedepth, keep_inbag, inbag, holdout, quantreg, time_interest, oob_error, save_memory, verbose, node_stats, seed, na_action
#> tuned: <int> -1
#> resampled: <int> 0
#> n_workers: <int> 1
#>
#> No search values defined for tunable hyperparameters.