Train Supervised Learning Models

Preprocess, tune, train, and test supervised learning models with a single function using nested resampling

Usage

train(
  x,
  dat_validation = NULL,
  dat_test = NULL,
  algorithm = NULL,
  preprocessor_parameters = NULL,
  hyperparameters = NULL,
  tuner_parameters = NULL,
  outer_resampling = NULL,
  weights = NULL,
  question = NULL,
  outdir = NULL,
  parallel_type = "future",
  verbosity = 1L
)

Arguments

x: data.frame or similar: Training set data.
dat_validation: data.frame or similar: Validation set data.
dat_test: data.frame or similar: Test set data.
algorithm: Character: Algorithm to use. Can be left NULL, if hyperparameters is defined.
preprocessor_parameters: PreprocessorParameters object or NULL: Setup using setup_Preprocessor.
hyperparameters: Hyperparameters object: Setup using one of setup_* functions.
tuner_parameters: TunerParameters object: Setup using setup_GridSearch.
outer_resampling: ResamplerParameters object or NULL: Setup using setup_Resampler. This defines the outer resampling method, i.e. the splitting into training and test sets for the purpose of assessing model performance. If NULL, no outer resampling is performed, in which case you might want to use a dat_test dataset to assess model performance on a single test set.
weights: Optional vector of case weights.
question: Optional character string defining the question that the model is trying to answer.
outdir: Character, optional: String defining the output directory.
parallel_type: Character: "none", or "future".
verbosity: Integer: Verbosity level. hyperparameters is not defined. Avoid relying on this, instead use the appropriate setup_* function with the hyperparameters argument.

Value

Object of class Regression(Supervised), RegressionRes(SupervisedRes), Classification(Supervised), or ClassificationRes(SupervisedRes).

Details

Important: For binary classification, the outcome should be a factor where the 2nd level corresponds to the positive class.

Note on resampling: You should never use an outer resampling method with replacement if you will also be using an inner resampling (for tuning). The duplicated cases from the outer resampling may appear both in the training and test sets of the inner resamples, leading to underestimated test error.

Author

EDG