4  Check data

After reading data, you should check your data. The check_data() function prints useful information about your dataset, along with recommendations, when applicable:

check_data(iris)
  iris: A data.table with 150 rows and 5 columns.

  Data types
  * 4 numeric features
  * 0 integer features
  * 1 factor, which is not ordered
  * 0 character features
  * 0 date features

  Issues
  * 0 constant features
  * 1 duplicate case
  * 0 missing values

  Recommendations
  * Consider removing the duplicate case 

It turns out the popular iris dataset contains one duplicate row.

Important

It is very important to ask for a data dictionary whenever you are given a dataset to analyze. However, that may not always be available. In that case, you need to do some further investigation to understand the data and assign the correct types to the features.

© 2025 E.D. Gennatas