Skip to contents

Preprocess, tune, train, and test supervised learning models with a single function using nested crossvalidation.

Usage

train(
  x,
  dat_validation = NULL,
  dat_test = NULL,
  algorithm = NULL,
  preprocessor_parameters = NULL,
  hyperparameters = NULL,
  tuner_parameters = NULL,
  outer_resampling = NULL,
  weights = NULL,
  question = NULL,
  outdir = NULL,
  parallel_type = "future",
  verbosity = 1L
)

Arguments

x

data.frame or similar: Training set data.

dat_validation

data.frame or similar: Validation set data.

dat_test

data.frame or similar: Test set data.

algorithm

Character: Algorithm to use. Can be left NULL, if hyperparameters is defined.

preprocessor_parameters

PreprocessorParameters object or NULL: Setup using setup_Preprocessor.

hyperparameters

Hyperparameters object: Setup using one of setup_* functions.

tuner_parameters

TunerParameters object: Setup using setup_GridSearch.

outer_resampling

ResamplerParameters object or NULL: Setup using setup_Resampler. This defines the outer resampling method, i.e. the splitting into training and test sets for the purpose of assessing model performance. If NULL, no outer resampling is performed, in which case you might want to use a dat_test dataset to assess model performance on a single test set.

weights

Optional vector of case weights.

question

Optional character string defining the question that the model is trying to answer.

outdir

Character, optional: String defining the output directory.

parallel_type

Character: "none", or "future".

verbosity

Integer: Verbosity level. hyperparameters is not defined. Avoid relying on this, instead use the appropriate setup_* function with the hyperparameters argument.

Value

Object of class Regression(Supervised), RegressionCV(SupervisedCV), Classification(Supervised), or ClassificationCV(SupervisedCV).

Details

Important: For binary classification, the outcome should be a factor where the 2nd level corresponds to the positive class.

Note on resampling: You should never use an outer resampling method with replacement if you will also be using an inner resampling (for tuning). The duplicated cases from the outer resampling may appear both in the training and test sets of the inner resamples, leading to underestimated test error.

Author

EDG