library(rtemis) .:rtemis 0.99.97 🌊 aarch64-apple-darwin20
library(data.table)Mass-univariate analysis is a statistical approach used to analyze the relationship between a predictor variable and multiple response variables one at a time. This method is particularly useful when dealing with high-dimensional data, like in genomics or neuroimaging. It can be useful and informative to perform ahead of predictive modeling.
library(rtemis) .:rtemis 0.99.97 🌊 aarch64-apple-darwin20
library(data.table)Let’s start with a synthetic dataset, which can help illustrate how the process works and what the results look like in an artificial clear-cut case.
set.seed(2025)
# single predictor
x <- data.table(x1 = rnorm(200, mean = 12))
# multiple response variables
y <- data.table(rnormmat(200, 500, mean = 2))
names(y) <- paste0("feature_", seq_len(ncol(y)))
# create some associations
y[, 3] <- y[, 3] + 0.8 * x[, x1]
y[, 24] <- y[, 24] + 0.5 * x[, x1]
y[, 415] <- y[, 415] - 0.7 * x[, x1]The massGLM() function is used to perform mass-univariate analysis by training multiple GLM models. It outputs a MassGLM object.
xy_massglm <- massGLM(x, y)The plot() method for MassGLM objects visualizes the results of the mass-univariate analysis, using a volcano plot. These are commonly used in a slightly different way in genomics, where the x-axis is used for the log fold change and the y-axis for the -log10 p-value. In this case, the x-axis is the coefficient of the predictor variable, and the y-axis is again the -log10 p-value. The points are colored based on the significance of the p-values, with a threshold set at 0.05 by default.
plot(xy_massglm)Group counts:
group
Low NS High
1 497 2
The plot_manhattan() method for MassGLM objects visualizes the results of the mass-univariate analysis using a barplot of the -log10 p-values, often called a Manhattan plot.
plot_manhattan(xy_massglm)