12  Decompose

Use available_decomposition() to get a listing of available decomposition / dimensionality reduction algorithms:

available_decomposition()
   ICA: Independent Component Analysis
   PCA: Principal Component Analysis
  tSNE: t-distributed Stochastic Neighbor Embedding
  UMAP: Uniform Manifold Approximation and Projection

We can further divide decomposition algorithms into linear (e.g. PCA, ICA, etc.) and nonlinear dimensionality reduction, (also called manifold learning, like tSNE and UMAP).

Note

The rtemis decomposition function is called decomp() to avoid clashing with the stats::decompose() built-in function.

12.1 Linear Dimensionality Reduction

As a simple example, let’s look the famous iris dataset. Note that we use this to demonstrate usage and is not a good example to assess the effectiveness of decomposition algorithms as the iris dataset consists of only 4 variables.

First, we select all variables from the iris dataset, excluding the group names, i.e. the labels. Since the iris dataset includes one duplicate observation, we can remove using preprocess(). This is required for t-SNE to work.

x <- preprocess(
  iris,
  setup_Preprocessor(remove_duplicates = TRUE)
)["preprocessed"]
2025-06-07 00:48:28 Removing 1 duplicate case...
2025-06-07 00:48:28 Preprocessing completed.

Now, let’s try a few different algorithms, projecting to two dimensions and visualizing using [mplot3_xy]. Notice we are using the real labels to colo points in these examples:

12.1.1 Principal Component Analysic (PCA)

iris_PCA <- decomp(
  x[, 1:4],
  algorithm = "PCA",
  parameters = setup_PCA(k = 2L)
)
2025-06-07 00:48:28 👽Hello. [decomp]

  Input: 149 cases x 4 features.
2025-06-07 00:48:28 Decomposing with PCA... [decomp]
2025-06-07 00:48:28 Checking unsupervised data... [check_unsupervised_data]
2025-06-07 00:48:28 Done in 1.2e-04 minutes (Real: 0.01; User: 3e-03; System: 1e-03). [decomp]
draw_scatter(
  iris_PCA$transformed[, 1],
  iris_PCA$transformed[, 2],
  group = x$Species,
  main = "PCA on iris",
  xlab = "1st PCA component",
  ylab = "2nd PCA component"
)

12.1.2 Independent Component Analysis (ICA)

iris_ICA <- decomp(
  x[, 1:4],
  algorithm = "ICA",
  parameters = setup_ICA(k = 2L)
)
2025-06-07 00:48:28 👽Hello. [decomp]

  Input: 149 cases x 4 features.
2025-06-07 00:48:28 Decomposing with ICA... [decomp]
2025-06-07 00:48:28 Checking unsupervised data... [check_unsupervised_data]

Centering
colstandard
Whitening
Symmetric FastICA using logcosh approx. to neg-entropy function
Iteration 1 tol=0.114263
Iteration 2 tol=0.000000
2025-06-07 00:48:28 Done in 2.3e-04 minutes (Real: 0.01; User: 4e-03; System: 2e-03). [decomp]
draw_scatter(
  iris_ICA$transformed[, 1],
  iris_ICA$transformed[, 2],
  group = x$Species,
  main = "ICA on iris",
  xlab = "1st ICA component",
  ylab = "2nd ICA component"
)

12.2 Non-linear dimensionality reduction

12.2.1 t-distributed Stochastic Neighbor Embedding (t-SNE)

iris_tSNE <- decomp(
  x[, 1:4],
  algorithm = "tSNE",
  parameters = setup_tSNE(k = 2L)
)
2025-06-07 00:48:28 👽Hello. [decomp]

  Input: 149 cases x 4 features.
2025-06-07 00:48:28 Decomposing with tSNE... [decomp]
2025-06-07 00:48:28 Checking unsupervised data... [check_unsupervised_data]
2025-06-07 00:48:28 Done in 3.7e-03 minutes (Real: 0.22; User: 0.20; System: 0.01). [decomp]
draw_scatter(
  iris_tSNE$transformed[, 1],
  iris_tSNE$transformed[, 2],
  group = x$Species,
  main = "tSNE on iris",
  xlab = "1st tSNE component",
  ylab = "2nd tSNE component"
)

12.2.1.1 Uniform Manifold Approximation and Projection (UMAP)

iris_UMAP <- decomp(
  x[, 1:4],
  algorithm = "UMAP",
  parameters = setup_UMAP(k = 2L)
)
2025-06-07 00:48:28 👽Hello. [decomp]

  Input: 149 cases x 4 features.
2025-06-07 00:48:28 Decomposing with UMAP... [decomp]
2025-06-07 00:48:29 Checking unsupervised data... [check_unsupervised_data]
2025-06-07 00:48:29 Done in 0.02 minutes (Real: 1.00; User: 0.93; System: 0.04). [decomp]
draw_scatter(
  iris_UMAP$transformed[, 1],
  iris_UMAP$transformed[, 2],
  group = x$Species,
  main = "UMAP on iris",
  xlab = "1st UMAP component",
  ylab = "2nd UMAP component"
)
© 2025 E.D. Gennatas