12  Mass-Univariate Analysis

12.1 Background

Mass-univariate analysis is a statistical approach used to analyze the relationship between a predictor variable and multiple response variables one at a time. This method is particularly useful when dealing with high-dimensional data, like in genomics or neuroimaging. It can be useful and informative to perform ahead of predictive modeling.

12.2 Setup

12.2.1 Packages

library(rtemis)
  .:rtemis 0.99.97 🌊 aarch64-apple-darwin20
library(data.table)

12.2.2 Data

Let’s start with a synthetic dataset, which can help illustrate how the process works and what the results look like in an artificial clear-cut case.

set.seed(2025)
# single predictor
x <- data.table(x1 = rnorm(200, mean = 12))
# multiple response variables
y <- data.table(rnormmat(200, 500, mean = 2))
names(y) <- paste0("feature_", seq_len(ncol(y)))
# create some associations
y[, 3] <- y[, 3] + 0.8 * x[, x1]
y[, 24] <- y[, 24] + 0.5 * x[, x1]
y[, 415] <- y[, 415] - 0.7 * x[, x1]

12.3 massGLM

12.3.1 Fit

The massGLM() function is used to perform mass-univariate analysis by training multiple GLM models. It outputs a MassGLM object.

xy_massglm <- massGLM(x, y)
2025-07-04 17:34:19 Hello. [massGLM]
2025-07-04 17:34:19 Scaling and centering 500 numeric features... [preprocess]
2025-07-04 17:34:19 Preprocessing done. [preprocess]
2025-07-04 17:34:19 Fitting 500 GLMs of family gaussian with 1 predictor each... [massGLM]
2025-07-04 17:34:19 Done in 0.47 seconds. [massGLM]

12.3.2 Volcano plot

The plot() method for MassGLM objects visualizes the results of the mass-univariate analysis, using a volcano plot. These are commonly used in a slightly different way in genomics, where the x-axis is used for the log fold change and the y-axis for the -log10 p-value. In this case, the x-axis is the coefficient of the predictor variable, and the y-axis is again the -log10 p-value. The points are colored based on the significance of the p-values, with a threshold set at 0.05 by default.

plot(xy_massglm)
Group counts:
group
 Low   NS High 
   1  497    2 

12.3.3 Manhattan plot

The plot_manhattan() method for MassGLM objects visualizes the results of the mass-univariate analysis using a barplot of the -log10 p-values, often called a Manhattan plot.

plot_manhattan(xy_massglm)
© 2025 E.D. Gennatas