Lazy-read a CSV file, optionally filter rows, remove duplicates, clean column names, convert character to factor, and collect.
Usage
ddb_data(
filename,
datadir = NULL,
sep = ",",
header = TRUE,
quotechar = "",
ignore_errors = TRUE,
make_unique = TRUE,
select_columns = NULL,
filter_column = NULL,
filter_vals = NULL,
character2factor = FALSE,
collect = TRUE,
progress = TRUE,
returnobj = c("data.table", "data.frame"),
data.table.key = NULL,
clean_colnames = TRUE,
verbosity = 1L
)
Arguments
- filename
Character: file name; either full path or just the file name, if
datadir
is also provided.- datadir
Character: Optional path if
filename
is not full path.- sep
Character: Field delimiter/separator.
- header
Logical: If TRUE, first line will be read as column names.
- quotechar
Character: Quote character.
- ignore_errors
Logical: If TRUE, ignore parsing errors (sometimes it's either this or no data, so).
- make_unique
Logical: If TRUE, keep only unique rows.
- select_columns
Character vector: Column names to select.
- filter_column
Character: Name of column to filter on, e.g. "ID".
- filter_vals
Numeric or Character vector: Values in
filter_column
to keep.filter_column
to keep.- character2factor
Logical: If TRUE, convert character columns to factors.
- collect
Logical: If TRUE, collect data and return structure class as defined by
returnobj
.- progress
Logical: If TRUE, print progress (no indication this works).
- returnobj
Character: "data.frame" or "data.table" object class to return. If "data.table", data.frame object returned from
DBI::dbGetQuery
is passed todata.table::setDT
; will add to execution time if very large, but then that's when you need a data.table.- data.table.key
Character: If set, this corresponds to a column name in the dataset. This column will be set as key in the data.table output.
- clean_colnames
Logical: If TRUE, clean colnames with clean_colnames.
- verbosity
Integer: Verbosity level.