available_decomposition()
ICA: Independent Component Analysis
PCA: Principal Component Analysis
tSNE: t-distributed Stochastic Neighbor Embedding
UMAP: Uniform Manifold Approximation and Projection
Use available_decomposition()
to get a listing of available decomposition / dimensionality reduction algorithms:
available_decomposition()
ICA: Independent Component Analysis
PCA: Principal Component Analysis
tSNE: t-distributed Stochastic Neighbor Embedding
UMAP: Uniform Manifold Approximation and Projection
We can further divide decomposition algorithms into linear (e.g. PCA, ICA, etc.) and nonlinear dimensionality reduction, (also called manifold learning, like tSNE and UMAP).
The rtemis decomposition function is called decomp()
to avoid clashing with the stats::decompose()
built-in function.
As a simple example, let’s look the famous iris
dataset. Note that we use this to demonstrate usage and is not a good example to assess the effectiveness of decomposition algorithms as the iris dataset consists of only 4 variables.
First, we select all variables from the iris dataset, excluding the group names, i.e. the labels. Since the iris dataset includes one duplicate observation, we can remove using preprocess()
. This is required for t-SNE to work.
<- preprocess(
x
iris,setup_Preprocessor(remove_duplicates = TRUE)
"preprocessed"] )[
Now, let’s try a few different algorithms, projecting to two dimensions and visualizing using [mplot3_xy]. Notice we are using the real labels to colo points in these examples:
<- decomp(
iris_PCA 1:4],
x[, algorithm = "PCA",
parameters = setup_PCA(k = 2L)
)
Input: 149 cases x 4 features.
draw_scatter(
$transformed[, 1],
iris_PCA$transformed[, 2],
iris_PCAgroup = x$Species,
main = "PCA on iris",
xlab = "1st PCA component",
ylab = "2nd PCA component"
)
<- decomp(
iris_ICA 1:4],
x[, algorithm = "ICA",
parameters = setup_ICA(k = 2L)
)
Input: 149 cases x 4 features.
Centering
colstandard
Whitening
Symmetric FastICA using logcosh approx. to neg-entropy function
Iteration 1 tol=0.114263
Iteration 2 tol=0.000000
draw_scatter(
$transformed[, 1],
iris_ICA$transformed[, 2],
iris_ICAgroup = x$Species,
main = "ICA on iris",
xlab = "1st ICA component",
ylab = "2nd ICA component"
)
<- decomp(
iris_tSNE 1:4],
x[, algorithm = "tSNE",
parameters = setup_tSNE(k = 2L)
)
Input: 149 cases x 4 features.
draw_scatter(
$transformed[, 1],
iris_tSNE$transformed[, 2],
iris_tSNEgroup = x$Species,
main = "tSNE on iris",
xlab = "1st tSNE component",
ylab = "2nd tSNE component"
)
<- decomp(
iris_UMAP 1:4],
x[, algorithm = "UMAP",
parameters = setup_UMAP(k = 2L)
)
Input: 149 cases x 4 features.
draw_scatter(
$transformed[, 1],
iris_UMAP$transformed[, 2],
iris_UMAPgroup = x$Species,
main = "UMAP on iris",
xlab = "1st UMAP component",
ylab = "2nd UMAP component"
)