Skip to contents

Setup hyperparameters for Ranger Random Forest training.

Usage

setup_Ranger(
  num_trees = 500,
  mtry = NULL,
  importance = "impurity",
  write_forest = TRUE,
  probability = FALSE,
  min_node_size = NULL,
  min_bucket = NULL,
  max_depth = NULL,
  replace = TRUE,
  sample_fraction = ifelse(replace, 1, 0.632),
  case_weights = NULL,
  class_weights = NULL,
  splitrule = NULL,
  num_random_splits = 1,
  alpha = 0.5,
  minprop = 0.1,
  poisson_tau = 1,
  split_select_weights = NULL,
  always_split_variables = NULL,
  respect_unordered_factors = NULL,
  scale_permutation_importance = FALSE,
  local_importance = FALSE,
  regularization_factor = 1,
  regularization_usedepth = FALSE,
  keep_inbag = FALSE,
  inbag = NULL,
  holdout = FALSE,
  quantreg = FALSE,
  time_interest = NULL,
  oob_error = TRUE,
  save_memory = FALSE,
  verbose = TRUE,
  node_stats = FALSE,
  seed = NULL,
  na_action = "na.learn",
  ifw = FALSE
)

Arguments

num_trees

(Tunable) Positive integer: Number of trees.

mtry

(Tunable) Positive integer: Number of features to consider at each split.

importance

Character: Variable importance mode. "none", "impurity", "impurity_corrected", "permutation". The "impurity" measure is the Gini index for classification, the variance of the responses for regression.

write_forest

Logical: Save ranger.forest object, required for prediction. Set to FALSE to reduce memory usage if no prediction intended.

probability

Logical: Grow a probability forest as in Malley et al. (2012). For classification only.

min_node_size

(Tunable) Positive integer: Minimal node size. Default 1 for classification, 5 for regression, 3 for survival, and 10 for probability.

min_bucket

Positive integer: Minimal number of samples in a terminal node. Only for survival. Deprecated in favor of min_node_size.

max_depth

(Tunable) Positive integer: Maximal tree depth. A value of NULL or 0 (the default) corresponds to unlimited depth, 1 to tree stumps (1 split per tree).

replace

Logical: Sample with replacement.

sample_fraction

(Tunable) Numeric: Fraction of observations to sample. Default is 1 for sampling with replacement and 0.632 for sampling without replacement.

case_weights

Numeric vector: Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.

class_weights

Numeric vector: Weights for the outcome classes for classification. Vector of the same length as the number of classes, with names corresponding to the class labels.

splitrule

(Tunable) Character: Splitting rule. For classification: "gini", "extratrees", "hellinger". For regression: "variance", "extratrees", "maxstat", "beta". For survival: "logrank", "extratrees", "C", "maxstat".

num_random_splits

(Tunable) Positive integer: For "extratrees" splitrule: Number of random splits to consider for each candidate splitting variable.

alpha

(Tunable) Numeric: For "maxstat" splitrule: significance threshold to allow splitting.

minprop

(Tunable) Numeric: For "maxstat" splitrule: lower quantile of covariate distribution to be considered for splitting.

poisson_tau

Numeric: For "poisson" regression splitrule: tau parameter for Poisson regression.

split_select_weights

Numeric vector: Numeric vector with weights between 0 and 1, representing the probability to select variables for splitting. Alternatively, a list of size num_trees, with one weight vector per tree.

always_split_variables

Character vector: Character vector with variable names to be always selected in addition to the mtry variables tried for splitting.

respect_unordered_factors

Character or logical: Handling of unordered factor covariates. For "partition" all 2^(k-1)-1 possible partitions are considered for splitting, where k is the number of factor levels. For "ignore", all factor levels are ordered by their first occurrence in the data. For "order", all factor levels are ordered by their average response. TRUE corresponds to "partition" for the randomForest package compatibility.

scale_permutation_importance

Logical: Scale permutation importance by standard error as in (Breiman 2001). Only applicable if permutation variable importance mode selected.

local_importance

Logical: For permutation variable importance, use local importance as in Breiman (2001) and Liaw & Wiener (2002).

regularization_factor

(Tunable) Numeric: Regularization factor. Penalize variables with many split points. Requires splitrule = "variance".

regularization_usedepth

Logical: Use regularization factor with node depth. Requires regularization_factor.

keep_inbag

Logical: Save how often observations are in-bag in each tree. These will be used for (local) variable importance if inbag.counts in predict() is NULL.

inbag

List: Manually set observations per tree. List of size num_trees, containing inbag counts for each observation. Can be used for stratified sampling.

holdout

Logical: Hold-out mode. Hold-out all samples with case weight 0 and use these for variable importance and prediction error.

quantreg

Logical: Prepare quantile prediction as in quantile regression forests (Meinshausen 2006). For regression only. Set keep_inbag = TRUE to prepare out-of-bag quantile prediction.

time_interest

Numeric: For GWAS data: SNP with this number will be used as time variable. Only for survival. Deprecated, use time.var in formula instead.

oob_error

Logical: Compute OOB prediction error. Set to FALSE to save computation time if only the forest is needed.

save_memory

Logical: Use memory saving (but slower) splitting mode. No effect for survival and GWAS data. Warning: This option slows down the tree growing, use only if you encounter memory problems.

verbose

Logical: Show computation status and estimated runtime.

node_stats

Logical: Save additional node statistics. Only terminal nodes for now.

seed

Positive integer: Random seed. Default is NULL, which generates the seed from R. Set to 0 to ignore the R seed.

na_action

Character: Action to take if the data contains missing values. "na.learn" uses observations with missing values in splitting, treating missing values as a separate category.

ifw

Logical: Inverse Frequency Weighting for classification. If TRUE, class weights are set inversely proportional to the class frequencies.

Value

RangerHyperparameters object.

Details

Get more information from ranger::ranger.

Author

EDG

Examples

ranger_hyperparams <- setup_Ranger(num_trees = 1000L, ifw = FALSE)
ranger_hyperparams
#> <RangerHyperparameters>
#>         hyperparameters: 
#>                                             num_trees: <int> 1000
#>                                                  mtry: <NUL> NULL
#>                                            importance: <chr> impurity
#>                                          write_forest: <lgc> TRUE
#>                                           probability: <lgc> FALSE
#>                                         min_node_size: <NUL> NULL
#>                                            min_bucket: <NUL> NULL
#>                                             max_depth: <NUL> NULL
#>                                               replace: <lgc> TRUE
#>                                       sample_fraction: <nmr> 1.00
#>                                          case_weights: <NUL> NULL
#>                                         class_weights: <NUL> NULL
#>                                             splitrule: <NUL> NULL
#>                                     num_random_splits: <int> 1
#>                                                 alpha: <nmr> 0.50
#>                                               minprop: <nmr> 0.10
#>                                           poisson_tau: <nmr> 1.00
#>                                  split_select_weights: <NUL> NULL
#>                                always_split_variables: <NUL> NULL
#>                             respect_unordered_factors: <NUL> NULL
#>                          scale_permutation_importance: <lgc> FALSE
#>                                      local_importance: <lgc> FALSE
#>                                 regularization_factor: <nmr> 1.00
#>                               regularization_usedepth: <lgc> FALSE
#>                                            keep_inbag: <lgc> FALSE
#>                                                 inbag: <NUL> NULL
#>                                               holdout: <lgc> FALSE
#>                                              quantreg: <lgc> FALSE
#>                                         time_interest: <NUL> NULL
#>                                             oob_error: <lgc> TRUE
#>                                           save_memory: <lgc> FALSE
#>                                               verbose: <lgc> TRUE
#>                                            node_stats: <lgc> FALSE
#>                                                  seed: <NUL> NULL
#>                                             na_action: <chr> na.learn
#>                                                   ifw: <lgc> FALSE
#> tunable_hyperparameters: <chr> num_trees, mtry, min_node_size, max_depth, sample_fraction, replace, splitrule, num_random_splits, alpha, minprop, regularization_factor, ifw
#>   fixed_hyperparameters: <chr> importance, write_forest, probability, min_bucket, case_weights, class_weights, poisson_tau, split_select_weights, always_split_variables, respect_unordered_factors, scale_permutation_importance, local_importance, regularization_usedepth, keep_inbag, inbag, holdout, quantreg, time_interest, oob_error, save_memory, verbose, node_stats, seed, na_action
#>                   tuned: <int> -1
#>               resampled: <int> 0
#>               n_workers: <int> 1
#> 
#>   No search values defined for tunable hyperparameters.