library(rtemis)
.:rtemis 0.99.97 🌊 aarch64-apple-darwin20
library(data.table)
Mass-univariate analysis is a statistical approach used to analyze the relationship between a predictor variable and multiple response variables one at a time. This method is particularly useful when dealing with high-dimensional data, like in genomics or neuroimaging. It can be useful and informative to perform ahead of predictive modeling.
library(rtemis)
.:rtemis 0.99.97 🌊 aarch64-apple-darwin20
library(data.table)
Let’s start with a synthetic dataset, which can help illustrate how the process works and what the results look like in an artificial clear-cut case.
set.seed(2025)
# single predictor
<- data.table(x1 = rnorm(200, mean = 12))
x # multiple response variables
<- data.table(rnormmat(200, 500, mean = 2))
y names(y) <- paste0("feature_", seq_len(ncol(y)))
# create some associations
3] <- y[, 3] + 0.8 * x[, x1]
y[, 24] <- y[, 24] + 0.5 * x[, x1]
y[, 415] <- y[, 415] - 0.7 * x[, x1] y[,
The massGLM()
function is used to perform mass-univariate analysis by training multiple GLM models. It outputs a MassGLM
object.
<- massGLM(x, y) xy_massglm
The plot()
method for MassGLM
objects visualizes the results of the mass-univariate analysis, using a volcano plot. These are commonly used in a slightly different way in genomics, where the x-axis is used for the log fold change and the y-axis for the -log10 p-value. In this case, the x-axis is the coefficient of the predictor variable, and the y-axis is again the -log10 p-value. The points are colored based on the significance of the p-values, with a threshold set at 0.05 by default.
plot(xy_massglm)
Group counts:
group
Low NS High
1 497 2
The plot_manhattan()
method for MassGLM
objects visualizes the results of the mass-univariate analysis using a barplot of the -log10 p-values, often called a Manhattan plot.
plot_manhattan(xy_massglm)