Read data and optionally clean column names, keep unique rows, and convert characters to factors
Usage
read(
filename,
datadir = NULL,
make_unique = TRUE,
character2factor = FALSE,
clean_colnames = TRUE,
delim_reader = c("data.table", "vroom", "duckdb", "arrow"),
xlsx_sheet = 1,
sep = NULL,
quote = "\"",
na_strings = c(""),
output = c("data.table", "default"),
attr = NULL,
value = NULL,
verbosity = 1L,
fread_verbosity = 0L,
timed = verbosity > 0L,
...
)
Arguments
- filename
Character: filename or full path if
datadir = NULL
- datadir
Character: Optional path to directory where
filename
is located. If not specified,filename
must be the full path.- make_unique
Logical: If TRUE, keep unique rows only
- character2factor
Logical: If TRUE, convert character variables to factors
- clean_colnames
Logical: If TRUE, clean columns names using clean_colnames
- delim_reader
Character: package to use for reading delimited data
- xlsx_sheet
Integer or character: Name or number of XLSX sheet to read
- sep
Single character: field separator. If
delim_reader = "fread"
andsep = NULL
, this defaults to "auto", otherwise defaults to ","- quote
Single character: quote character
- na_strings
Character vector: Strings to be interpreted as NA values. For
delim_reader = "duckdb"
, this must be a single string.- output
Character: "default" or "data.table", If default, return the delim_reader's default data structure, otherwise convert to data.table
- attr
Character: Attribute to set (Optional)
- value
Character: Value to set (if
attr
is not NULL)- verbosity
Integer: Verbosity level.
- fread_verbosity
Integer: Verbosity level. Passed to
data.table::fread
- timed
Logical: If TRUE, time the process and print to console
- ...
Additional parameters to pass to
data.table::fread
,arrow::read_delim_arrow()
,vroom::vroom()
, orreadxl::read_excel()
Details
read
is a convenience function to read:
Delimited files using
data.table:fread()
,arrow:read_delim_arrow()
,vroom::vroom()
,duckdb::duckdb_read_csv()
ARFF files using
farff::readARFF()
Parquet files using
arrow::read_parquet()
XLSX files using
readxl::read_excel()
DTA files from Stata using
haven::read_dta()
FASTA files using
seqinr::read.fasta()
RDS files using
readRDS()