13 Decompose

Use available_decomposition() to get a listing of available decomposition / dimensionality reduction algorithms:

available_decomposition()

   ICA: Independent Component Analysis
   NMF: Non-negative Matrix Factorization
   PCA: Principal Component Analysis
  tSNE: t-distributed Stochastic Neighbor Embedding
  UMAP: Uniform Manifold Approximation and Projection

We can further divide decomposition algorithms into linear (e.g. PCA, ICA, etc.) and nonlinear dimensionality reduction, (also called manifold learning, like tSNE and UMAP).

Note

The rtemis decomposition function is called decomp() to avoid clashing with the stats::decompose() built-in function.

For the examples below, let’s set the default theme to “darkgraygrid”. See the Theme section for more information on available themes.

options(rtemis_theme = "darkgraygrid")

13.1 Linear Dimensionality Reduction

As a simple example, let’s look the famous iris dataset. Note that we use this to demonstrate usage and is not a good example to assess the effectiveness of decomposition algorithms as the iris dataset consists of only 4 variables.

First, we select all variables from the iris dataset, excluding the group names, i.e. the labels. Since the iris dataset includes one duplicate observation, we can remove using preprocess(). This is required for t-SNE to work.

x <- preprocess(
  iris,
  setup_Preprocessor(remove_duplicates = TRUE)
)["preprocessed"]

2025-06-13 22:01:21 Removing 1 duplicate case...
2025-06-13 22:01:21 Preprocessing completed.

Now, let’s try a few different algorithms, projecting to three dimensions and visualizing using draw_3Dscatter.

13.1.1 Principal Component Analysic (PCA)

iris_PCA <- decomp(
  x[, 1:4],
  algorithm = "PCA",
  parameters = setup_PCA(k = 3L)
)

2025-06-13 22:01:21 Hello. [decomp]
2025-06-13 22:01:21 Input: 149 cases x 4 features.
2025-06-13 22:01:21 Decomposing with PCA... [decomp]
2025-06-13 22:01:21 Checking unsupervised data... ✔ [check_unsupervised_data]
2025-06-13 22:01:21 Done in 3e-03 seconds. [decomp]

draw_3Dscatter(
  iris_PCA$transformed,
  group = x$Species,
  main = "PCA on iris",
  xlab = "1st PCA component",
  ylab = "2nd PCA component",
  zlab = "3rd PCA component"
)

13.1.2 Independent Component Analysis (ICA)

iris_ICA <- decomp(
  x[, 1:4],
  algorithm = "ICA",
  parameters = setup_ICA(k = 3L)
)

2025-06-13 22:01:21 Hello. [decomp]
2025-06-13 22:01:21 Input: 149 cases x 4 features.
2025-06-13 22:01:21 Decomposing with ICA... [decomp]
2025-06-13 22:01:21 Checking unsupervised data... ✔ [check_unsupervised_data]

Centering
colstandard
Whitening
Symmetric FastICA using logcosh approx. to neg-entropy function
Iteration 1 tol=0.036595
Iteration 2 tol=0.002426
Iteration 3 tol=0.005013
Iteration 4 tol=0.009113
Iteration 5 tol=0.009062
Iteration 6 tol=0.003280
Iteration 7 tol=0.000738
Iteration 8 tol=0.000186
Iteration 9 tol=0.000053

2025-06-13 22:01:21 Done in 0.01 seconds. [decomp]

draw_3Dscatter(
  iris_ICA$transformed,
  group = x$Species,
  main = "ICA on iris",
  xlab = "1st ICA component",
  ylab = "2nd ICA component",
  zlab = "3rd ICA component"
)

13.1.3 Non-negative Matrix Factorization (NMF)

iris_NMF <- decomp(
  x[, 1:4],
  algorithm = "NMF",
  parameters = setup_NMF(k = 3L)
)

2025-06-13 22:01:21 Hello. [decomp]
2025-06-13 22:01:21 Input: 149 cases x 4 features.
2025-06-13 22:01:21 Decomposing with NMF... [decomp]
2025-06-13 22:01:21 Checking unsupervised data... ✔ [check_unsupervised_data]
2025-06-13 22:01:22 Done in 1.12 seconds. [decomp]

draw_3Dscatter(
  iris_NMF$transformed,
  group = x$Species,
  main = "NMF on iris",
  xlab = "1st NMF component",
  ylab = "2nd NMF component",
  zlab = "3rd NMF component"
)

13.2 Non-linear dimensionality reduction

13.2.1 t-distributed Stochastic Neighbor Embedding (t-SNE)

iris_tSNE <- decomp(
  x[, 1:4],
  algorithm = "tSNE",
  parameters = setup_tSNE(k = 3L)
)

2025-06-13 22:01:22 Hello. [decomp]
2025-06-13 22:01:22 Input: 149 cases x 4 features.
2025-06-13 22:01:22 Decomposing with tSNE... [decomp]
2025-06-13 22:01:22 Checking unsupervised data... ✔ [check_unsupervised_data]
2025-06-13 22:01:22 Done in 0.28 seconds. [decomp]

draw_3Dscatter(
  iris_tSNE$transformed,
  group = x$Species,
  main = "tSNE on iris",
  xlab = "1st tSNE component",
  ylab = "2nd tSNE component",
  zlab = "3rd tSNE component"
)

13.2.1.1 Uniform Manifold Approximation and Projection (UMAP)

iris_UMAP <- decomp(
  x[, 1:4],
  algorithm = "UMAP",
  parameters = setup_UMAP(k = 3L)
)

2025-06-13 22:01:23 Hello. [decomp]
2025-06-13 22:01:23 Input: 149 cases x 4 features.
2025-06-13 22:01:23 Decomposing with UMAP... [decomp]
2025-06-13 22:01:23 Checking unsupervised data... ✔ [check_unsupervised_data]
2025-06-13 22:01:23 Done in 0.90 seconds. [decomp]

draw_3Dscatter(
  iris_UMAP$transformed,
  group = x$Species,
  main = "UMAP on iris",
  xlab = "1st UMAP component",
  ylab = "2nd UMAP component",
  zlab = "3rd UMAP component"
)