available_decomposition()
ICA: Independent Component Analysis
NMF: Non-negative Matrix Factorization
PCA: Principal Component Analysis
tSNE: t-distributed Stochastic Neighbor Embedding
UMAP: Uniform Manifold Approximation and Projection
Use available_decomposition()
to get a listing of available decomposition / dimensionality reduction algorithms:
available_decomposition()
ICA: Independent Component Analysis
NMF: Non-negative Matrix Factorization
PCA: Principal Component Analysis
tSNE: t-distributed Stochastic Neighbor Embedding
UMAP: Uniform Manifold Approximation and Projection
We can further divide decomposition algorithms into linear (e.g. PCA, ICA, etc.) and nonlinear dimensionality reduction, (also called manifold learning, like tSNE and UMAP).
The rtemis decomposition function is called decomp()
to avoid clashing with the stats::decompose()
built-in function.
For the examples below, let’s set the default theme to “darkgraygrid”. See the Theme section for more information on available themes.
options(rtemis_theme = "darkgraygrid")
As a simple example, let’s look the famous iris
dataset. Note that we use this to demonstrate usage and is not a good example to assess the effectiveness of decomposition algorithms as the iris dataset consists of only 4 variables.
First, we select all variables from the iris dataset, excluding the group names, i.e. the labels. Since the iris dataset includes one duplicate observation, we can remove using preprocess()
. This is required for t-SNE to work.
<- preprocess(
x
iris,setup_Preprocessor(remove_duplicates = TRUE)
"preprocessed"] )[
Now, let’s try a few different algorithms, projecting to three dimensions and visualizing using draw_3Dscatter.
<- decomp(
iris_PCA 1:4],
x[, algorithm = "PCA",
parameters = setup_PCA(k = 3L)
)
draw_3Dscatter(
$transformed,
iris_PCAgroup = x$Species,
main = "PCA on iris",
xlab = "1st PCA component",
ylab = "2nd PCA component",
zlab = "3rd PCA component"
)
<- decomp(
iris_ICA 1:4],
x[, algorithm = "ICA",
parameters = setup_ICA(k = 3L)
)
Centering
colstandard
Whitening
Symmetric FastICA using logcosh approx. to neg-entropy function
Iteration 1 tol=0.036595
Iteration 2 tol=0.002426
Iteration 3 tol=0.005013
Iteration 4 tol=0.009113
Iteration 5 tol=0.009062
Iteration 6 tol=0.003280
Iteration 7 tol=0.000738
Iteration 8 tol=0.000186
Iteration 9 tol=0.000053
draw_3Dscatter(
$transformed,
iris_ICAgroup = x$Species,
main = "ICA on iris",
xlab = "1st ICA component",
ylab = "2nd ICA component",
zlab = "3rd ICA component"
)
<- decomp(
iris_NMF 1:4],
x[, algorithm = "NMF",
parameters = setup_NMF(k = 3L)
)
draw_3Dscatter(
$transformed,
iris_NMFgroup = x$Species,
main = "NMF on iris",
xlab = "1st NMF component",
ylab = "2nd NMF component",
zlab = "3rd NMF component"
)
<- decomp(
iris_tSNE 1:4],
x[, algorithm = "tSNE",
parameters = setup_tSNE(k = 3L)
)
draw_3Dscatter(
$transformed,
iris_tSNEgroup = x$Species,
main = "tSNE on iris",
xlab = "1st tSNE component",
ylab = "2nd tSNE component",
zlab = "3rd tSNE component"
)
<- decomp(
iris_UMAP 1:4],
x[, algorithm = "UMAP",
parameters = setup_UMAP(k = 3L)
)
draw_3Dscatter(
$transformed,
iris_UMAPgroup = x$Species,
main = "UMAP on iris",
xlab = "1st UMAP component",
ylab = "2nd UMAP component",
zlab = "3rd UMAP component"
)