Skip to contents

Preprocess data for analysis and visualization.

Usage

## S7 generic
preprocess(x, parameters, ...)

Arguments

x

data.frame or similar: Data to be preprocessed.

parameters

PreprocessorParameters or Preprocessor: PreprocessorParameters when preprocessing training set data. Setup using setup_Preprocessor. Preprocessor when preprocessing validation and test set data.

...

Used to pass dat_validation and dat_test to the method for Preprocessor.

Value

Preprocessor object.

Details

Methods are provided for preprocessing training set data, which accepts a PreprocessorParameters object, and for preprocessing validation and test set data, which accept a Preprocessor object.

Order of operations:

  • keep complete cases only

  • remove constants

  • remove duplicates

  • remove cases by missingness threshold

  • remove features by missingness threshold

  • integer to factor

  • integer to numeric

  • logical to factor

  • logical to numeric

  • numeric to factor

  • cut numeric to n bins

  • cut numeric to n quantiles

  • numeric with less than N unique values to factor

  • character to factor

  • factor NA to named level

  • add missingness column

  • impute

  • scale and/or center

  • one-hot encoding

Author

EDG