Setup Ranger Hyperparameters — setup

Setup hyperparameters for Ranger Random Forest training.

Usage

setup_Ranger(
  num_trees = 500,
  mtry = NULL,
  importance = "impurity",
  write_forest = TRUE,
  probability = FALSE,
  min_node_size = NULL,
  min_bucket = NULL,
  max_depth = NULL,
  replace = TRUE,
  sample_fraction = ifelse(replace, 1, 0.632),
  case_weights = NULL,
  class_weights = NULL,
  splitrule = NULL,
  num_random_splits = 1,
  alpha = 0.5,
  minprop = 0.1,
  poisson_tau = 1,
  split_select_weights = NULL,
  always_split_variables = NULL,
  respect_unordered_factors = NULL,
  scale_permutation_importance = FALSE,
  local_importance = FALSE,
  regularization_factor = 1,
  regularization_usedepth = FALSE,
  keep_inbag = FALSE,
  inbag = NULL,
  holdout = FALSE,
  quantreg = FALSE,
  time_interest = NULL,
  oob_error = TRUE,
  save_memory = FALSE,
  verbose = TRUE,
  node_stats = FALSE,
  seed = NULL,
  na_action = "na.learn",
  ifw = FALSE
)

Arguments

num_trees: (Tunable) Positive integer: Number of trees.
mtry: (Tunable) Positive integer: Number of features to consider at each split.
importance: Character: Variable importance mode. "none", "impurity", "impurity_corrected", "permutation". The "impurity" measure is the Gini index for classification, the variance of the responses for regression.
write_forest: Logical: Save ranger.forest object, required for prediction. Set to FALSE to reduce memory usage if no prediction intended.
probability: Logical: Grow a probability forest as in Malley et al. (2012). For classification only.
min_node_size: (Tunable) Positive integer: Minimal node size. Default 1 for classification, 5 for regression, 3 for survival, and 10 for probability.
min_bucket: Positive integer: Minimal number of samples in a terminal node. Only for survival. Deprecated in favor of min_node_size.
max_depth: (Tunable) Positive integer: Maximal tree depth. A value of NULL or 0 (the default) corresponds to unlimited depth, 1 to tree stumps (1 split per tree).
replace: Logical: Sample with replacement.
sample_fraction: (Tunable) Numeric: Fraction of observations to sample. Default is 1 for sampling with replacement and 0.632 for sampling without replacement.
case_weights: Numeric vector: Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.
class_weights: Numeric vector: Weights for the outcome classes for classification. Vector of the same length as the number of classes, with names corresponding to the class labels.
splitrule: (Tunable) Character: Splitting rule. For classification: "gini", "extratrees", "hellinger". For regression: "variance", "extratrees", "maxstat", "beta". For survival: "logrank", "extratrees", "C", "maxstat".
num_random_splits: (Tunable) Positive integer: For "extratrees" splitrule: Number of random splits to consider for each candidate splitting variable.
alpha: (Tunable) Numeric: For "maxstat" splitrule: significance threshold to allow splitting.
minprop: (Tunable) Numeric: For "maxstat" splitrule: lower quantile of covariate distribution to be considered for splitting.
poisson_tau: Numeric: For "poisson" regression splitrule: tau parameter for Poisson regression.
split_select_weights: Numeric vector: Numeric vector with weights between 0 and 1, representing the probability to select variables for splitting. Alternatively, a list of size num_trees, with one weight vector per tree.
always_split_variables: Character vector: Character vector with variable names to be always selected in addition to the mtry variables tried for splitting.
respect_unordered_factors: Character or logical: Handling of unordered factor covariates. For "partition" all 2^(k-1)-1 possible partitions are considered for splitting, where k is the number of factor levels. For "ignore", all factor levels are ordered by their first occurrence in the data. For "order", all factor levels are ordered by their average response. TRUE corresponds to "partition" for the randomForest package compatibility.
scale_permutation_importance: Logical: Scale permutation importance by standard error as in (Breiman 2001). Only applicable if permutation variable importance mode selected.
local_importance: Logical: For permutation variable importance, use local importance as in Breiman (2001) and Liaw & Wiener (2002).
regularization_factor: (Tunable) Numeric: Regularization factor. Penalize variables with many split points. Requires splitrule = "variance".
regularization_usedepth: Logical: Use regularization factor with node depth. Requires regularization_factor.
keep_inbag: Logical: Save how often observations are in-bag in each tree. These will be used for (local) variable importance if inbag.counts in predict() is NULL.
inbag: List: Manually set observations per tree. List of size num_trees, containing inbag counts for each observation. Can be used for stratified sampling.
holdout: Logical: Hold-out mode. Hold-out all samples with case weight 0 and use these for variable importance and prediction error.
quantreg: Logical: Prepare quantile prediction as in quantile regression forests (Meinshausen 2006). For regression only. Set keep_inbag = TRUE to prepare out-of-bag quantile prediction.
time_interest: Numeric: For GWAS data: SNP with this number will be used as time variable. Only for survival. Deprecated, use time.var in formula instead.
oob_error: Logical: Compute OOB prediction error. Set to FALSE to save computation time if only the forest is needed.
save_memory: Logical: Use memory saving (but slower) splitting mode. No effect for survival and GWAS data. Warning: This option slows down the tree growing, use only if you encounter memory problems.
verbose: Logical: Show computation status and estimated runtime.
node_stats: Logical: Save additional node statistics. Only terminal nodes for now.
seed: Positive integer: Random seed. Default is NULL, which generates the seed from R. Set to 0 to ignore the R seed.
na_action: Character: Action to take if the data contains missing values. "na.learn" uses observations with missing values in splitting, treating missing values as a separate category.
ifw: Logical: Inverse Frequency Weighting for classification. If TRUE, class weights are set inversely proportional to the class frequencies.

Value

RangerHyperparameters object.

Details

Get more information from ranger::ranger.

Author

EDG

Examples

ranger_hyperparams <- setup_Ranger(num_trees = 1000L, ifw = FALSE)
ranger_hyperparams
#> <RangerHyperparameters>
#>         hyperparameters: 
#>                                             num_trees: <int> 1000
#>                                                  mtry: <NUL> NULL
#>                                            importance: <chr> impurity
#>                                          write_forest: <lgc> TRUE
#>                                           probability: <lgc> FALSE
#>                                         min_node_size: <NUL> NULL
#>                                            min_bucket: <NUL> NULL
#>                                             max_depth: <NUL> NULL
#>                                               replace: <lgc> TRUE
#>                                       sample_fraction: <nmr> 1.00
#>                                          case_weights: <NUL> NULL
#>                                         class_weights: <NUL> NULL
#>                                             splitrule: <NUL> NULL
#>                                     num_random_splits: <int> 1
#>                                                 alpha: <nmr> 0.50
#>                                               minprop: <nmr> 0.10
#>                                           poisson_tau: <nmr> 1.00
#>                                  split_select_weights: <NUL> NULL
#>                                always_split_variables: <NUL> NULL
#>                             respect_unordered_factors: <NUL> NULL
#>                          scale_permutation_importance: <lgc> FALSE
#>                                      local_importance: <lgc> FALSE
#>                                 regularization_factor: <nmr> 1.00
#>                               regularization_usedepth: <lgc> FALSE
#>                                            keep_inbag: <lgc> FALSE
#>                                                 inbag: <NUL> NULL
#>                                               holdout: <lgc> FALSE
#>                                              quantreg: <lgc> FALSE
#>                                         time_interest: <NUL> NULL
#>                                             oob_error: <lgc> TRUE
#>                                           save_memory: <lgc> FALSE
#>                                               verbose: <lgc> TRUE
#>                                            node_stats: <lgc> FALSE
#>                                                  seed: <NUL> NULL
#>                                             na_action: <chr> na.learn
#>                                                   ifw: <lgc> FALSE
#> tunable_hyperparameters: <chr> num_trees, mtry, min_node_size, max_depth, sample_fraction, replace, splitrule, num_random_splits, alpha, minprop, regularization_factor, ifw
#>   fixed_hyperparameters: <chr> importance, write_forest, probability, min_bucket, case_weights, class_weights, poisson_tau, split_select_weights, always_split_variables, respect_unordered_factors, scale_permutation_importance, local_importance, regularization_usedepth, keep_inbag, inbag, holdout, quantreg, time_interest, oob_error, save_memory, verbose, node_stats, seed, na_action
#>                   tuned: <int> -1
#>               resampled: <int> 0
#>               n_workers: <int> 1
#> 
#>   No search values defined for tunable hyperparameters.