7 Draw
Visualization is a central part of any data analysis pipeline. Ideally, you want to visualize data before and after all operations, if possible. Depending on the kind and amount of data you are working on, this can range from straightforward to rather challening, but it’s always worthwhile.
The new rtemis package (v.0.99+_ provides the draw_*
family of functions, which uses plotly to create visualizations that:
- are interactive when viewed in an IDE or web browser.
- can be exported to static images (e.g. SVG, PDF, PNG, etc.).
- can be used in Quarto documents and shiny apps.
Interactive graphics offer a flexible and dynamic way to communicate information, great for websites / web applications and live demonstrations. rtemis uses the powerful plotly open source graphing library and other libraries built on top of that.
While viewing these graphs, try using the mouse to hover over and click to interact with the graphic elements - especially with the 3D plots.
7.1 Overview
You can print available draw_*
functions using the available_draw()
function. Here is an interactive table of the available functions:
7.2 Density and Histograms
draw_dist(iris$Sepal.Length)
To plot multiple traces, you can either pass a list, or define groups by passing a factor to the group
argument. By default, mode = "overlap"
, which draws traces in the same plot.
draw_dist(iris$Sepal.Length, group = iris$Species)
Note that non-numeric columns are automatically omitted. You can set mode = "ridge"
to create a multiplot:
draw_dist(iris, mode = "ridge")
By default, “ridge” mode will order plot order by variable mean. This can be changed using the ridge_order_on_mean
when you want to maintain group ordering - for example, if groups represent temporal information.
<- list(
xl mango = rnorm(200, 7, 1),
banana = rnorm(200, 10, .8),
tangerine = rnorm(400, 0, 2),
sugar = rnorm(500, 3, 1.5)
)draw_dist(xl)
draw_dist(xl, mode = 'ridge', ridge_order_on_mean = FALSE)
draw_dist(xl, mode = 'ridge') # default is TRUE
7.3 Scatter plots
set.seed(2025)
<- 500
n <- rnorm(n)
x <- x^2 + rnorm(n, 2, 1)
y <- x^3 + rnorm(n, 3, 2) z
draw_scatter(x, y)
Add a fit line using the fit
argument, which accepts the name of any supervised learner available in rtemis:
draw_scatter(x, y, fit = "gam")
Add a confidence interval using the se_fit
argument, which is only available for “GLM” and “GAM” fits:
draw_scatter(x, y, fit = "gam", se_fit = TRUE)
Have fun with other learners:
draw_scatter(x, y, fit = "cart")
Lists (and therefore data.frames) are also supported:
draw_scatter(x, list(Square = y, Cube = z),
fit = "gam", se_fit = TRUE)
7.3.1 Scatterplot + Cluster
We already saw we can use any learner to draw a fit line in a scatter plot. You can similarly use any clutering algorithm to cluster the data and color them by cluster membership. Learn more about [Clustering].
draw_scatter(
$Sepal.Width,
iris$Petal.Width,
iriscluster = "NeuralGas",
fit = "gam",
se_fit = TRUE
)
Input: 150 cases x 2 features.
7.4 3D Scatter plots
The function for 3D scatterplots is draw_3Dscatter
.
You can:
- Specify
x
,y
, andz
individually. - Pass a list or data.frame with at least 3 elements/columns.
If there are more than 3 columns, the first 3 will be used:
draw_3Dscatter(iris)
draw_3Dscatter(iris, fit = "gam")
group
works as expected:
draw_3Dscatter(iris, group = iris$Species)
7.4.1 Glass-cut plots
We can plot fitted surfaces using the fit
argument. The dependent variable is z
, i.e. we fit a model of the type z ~ x + y
.
Use the mouse to rotate the plot:
set.seed(2019)
<- rnorm(500)
x1 <- rnorm(500)
x2 <- x1^2 + x2^3 + 3 + rnorm(500) * 3
y draw_3Dscatter(x1, x2, y, fit = "gam")
With groups:
draw_3Dscatter(iris, fit = "glm", group = iris$Species)
draw_3Dscatter(iris, fit = "gam", group = iris$Species)
7.5 Heatmaps
<- rnormmat(20, 20, seed = 2018)
x <- cor(x) x_cor
draw_heatmap(x_cor)
Loading required namespace: colorspace
7.6 Barplots
draw_bar(VADeaths)
7.7 Boxplots
Some synthetic data:
set.seed(1999)
<- list(mango = rnorm(200, 1, 1),
x banana = rpois(500, sample(c(0, 1, 2), 500, T)),
tangerine = rbinom(500, 1, .3),
sugar = rgamma(400, shape = 1))
draw_box(x)
7.8 Violin Plots
Violin plots are extended boxplots that visualize the actual variable distribution as density plots around the standard boxplot.
draw_box(x, type = "violin")
7.9 Pie Charts
Pie charts are best avoided, but if you need them, there’s draw_pie()
.
Some real population data:
<- structure(list(Continent = structure(c(2L, 1L, 3L, 6L, 4L, 5L),
x .Label = c("Africa", "Asia",
"Europe", "North America",
"Oceania", "South America"), class = "factor"),
Population = c(4601371198, 1308064195, 747182751,
427199446, 366600964, 42128035)),
class = "data.frame",
row.names = c(NA, -6L))
draw_pie(x)