Preprocess, tune, train, and test supervised learning models with a single function using nested crossvalidation.
Usage
train(
x,
dat_validation = NULL,
dat_test = NULL,
algorithm = NULL,
preprocessor_parameters = NULL,
hyperparameters = NULL,
tuner_parameters = NULL,
outer_resampling = NULL,
weights = NULL,
question = NULL,
outdir = NULL,
parallel_type = "future",
verbosity = 1L
)
Arguments
- x
data.frame or similar: Training set data.
- dat_validation
data.frame or similar: Validation set data.
- dat_test
data.frame or similar: Test set data.
- algorithm
Character: Algorithm to use. Can be left NULL, if
hyperparameters
is defined.- preprocessor_parameters
PreprocessorParameters object or NULL: Setup using setup_Preprocessor.
- hyperparameters
Hyperparameters object: Setup using one of
setup_*
functions.- tuner_parameters
TunerParameters object: Setup using setup_GridSearch.
- outer_resampling
ResamplerParameters object or NULL: Setup using setup_Resampler. This defines the outer resampling method, i.e. the splitting into training and test sets for the purpose of assessing model performance. If NULL, no outer resampling is performed, in which case you might want to use a
dat_test
dataset to assess model performance on a single test set.- weights
Optional vector of case weights.
- question
Optional character string defining the question that the model is trying to answer.
- outdir
Character, optional: String defining the output directory.
- parallel_type
Character: "none", or "future".
- verbosity
Integer: Verbosity level.
hyperparameters
is not defined. Avoid relying on this, instead use the appropriatesetup_*
function with thehyperparameters
argument.
Value
Object of class Regression(Supervised)
, RegressionCV(SupervisedCV)
,
Classification(Supervised)
, or ClassificationCV(SupervisedCV)
.
Details
Important: For binary classification, the outcome should be a factor where the 2nd level corresponds to the positive class.
Note on resampling: You should never use an outer resampling method with replacement if you will also be using an inner resampling (for tuning). The duplicated cases from the outer resampling may appear both in the training and test sets of the inner resamples, leading to underestimated test error.