vignettes/introduction.Rmd
introduction.Rmd
obsplot
is an R package that allows to use the Observable Plot
library to create charts as HTML widgets. Observable Plot is a free,
open-source JavaScript visualisation library developed by Mike Bostock and Philippe Rivière at Observable.
obsplot
is still in an early stage, in particular its
API could change in the future, either for self improvements or to
follow Observable Plot evolutions. It may not be suitable for production
right now.
Also to be considered, obsplot
is not suitable for
charting very large datasets : the generated plots are in SVG format,
and when using it in RMarkdown or Shiny the underlying data are included
in the output as JSON.
obsplot
is not on CRAN yet, but can be installed from
Github with :
remotes::install_github("juba/obsplot")
Or from R-universe with :
install.packages("obsplot", repos = "https://juba.r-universe.dev")
Don’t forget to load the library with :
Suppose we want to create a very simple dot chart from the
penguins
dataset of the palmerpenguins
package
:
library(palmerpenguins)
data(penguins)
To create such a chart we first initialise it with
obsplot()
. We pass as argument the data frame containing
the data to plot :
obsplot(penguins)
We then add a graphical mark to create our chart. Here we use the dot
mark by piping the mark_dot
function. We pass as arguments
the x
and y
channels giving the
corresponding data frame columns :
Here we passed the data frame columns as symbols, but we can also use character strings instead :
We can add other channels, for example by changing dots color according to another variable :
We can also add constant options to a mark to modify an attribute in the same way for all dots :
We can also add global options to the chart with the
opts()
function :
obsplot(penguins) |>
mark_dot(x = bill_length_mm, y = bill_depth_mm, stroke = island, r = 2) |>
opts(grid = TRUE)
Finally, we can modify the way variables values are linked to graphical attributes by using scales function :
obsplot(penguins) |>
mark_dot(x = bill_length_mm, y = bill_depth_mm, stroke = island, r = 2) |>
scale_color(scheme = "set1") |>
opts(grid = TRUE)
To go a bit deeper, we have to take a look at the fundamental concepts of Observable Plot : marks, faceting, scales and transforms.
Marks are the fundamental building blocks of Observable Plot charts. Each mark is a graphical representation of some data under a specific form : dot, line, area, text…
In Observable Plot, marks are defined by giving a marks
JavaScript array to the Plot.plot()
function. In
obsplot
, it is done by piping one or more of the
mark_*
family of functions. In the following example we add
three different marks to create a scatterplot with two rules for
x
and y
mean values :
mean_length <- mean(penguins$bill_length_mm, na.rm = TRUE)
mean_depth <- mean(penguins$bill_depth_mm, na.rm = TRUE)
obsplot(penguins) |>
mark_ruleY(y = mean_depth) |>
mark_ruleX(x = mean_length) |>
mark_dot(x = bill_length_mm, y = bill_depth_mm)
A mark function takes several arguments. The first one is an optional
data
object. If not specified, it is inherited from the one
passed to obsplot
. Other named arguments are called mark
constructors and can be of several types :
data
, as a string ("col"
) or a symbol
(col
)JS()
function,
evaluated at runtimeIn the following example, both x
and y
are
column channels, whereas stroke
is a constant. In fact
values passed to a color constructor (stroke
or
fill
) are automatically considered as constant if they look
like a CSS color name or definition.
If we want to highlight some points by adding a text label, we can do
it by giving a specific data
argument to
mark_text
:
metros_10m <- subset(metros, POP_2015 > 10000000)
obsplot(metros) |>
mark_dot(x = POP_1980, y = POP_2015, stroke = "#D00") |>
mark_text(metros_10m, x = POP_1980, y = POP_2015, text = nyt_display, dy = -10)
We can also use JavaScript code. For example, we can use accessors to convert population values to million of people :
obsplot(metros) |>
mark_dot(
x = JS("d => d.POP_1980 / 1000000"),
y = JS("d => d.POP_2015 / 1000000"),
stroke = "#D00"
)
We can also provide data directly to one of the channels (in
Observable Plot, you can do it only by specifying a corresponding
indexed data
argument of the same length, this is done
automatically in obsplot
) :
obsplot() |>
mark_lineY(y = cumsum(rnorm(100))) |>
mark_ruleY(0)
The rules to determine a channel type are as follows (this may be subject to change in the future):
JS()
, it is JavaScript coder
,
strikeOpacity
,fillOpacity
,
fontSize
and rotate
fill
and
stroke
You can explicitly specify that a channel is a vector channel by
using the as_data()
helper function. In the following
example, without as_data
the code would raise an error as
it would look for a "Paris"
column in the data :
obsplot(metros) |>
mark_dot(x = POP_1980, y = POP_2015) |>
mark_dot(x = 9000000, y = 10600000, stroke = "red") |>
mark_text(x = 9000000, y = 10600000, text = as_data("Paris"), dy = -10)
When a column or vector channel is of type Date
or
POSIXt
in R, it is automatically converted to
Date
in JavaScript, and Observable Plot will take it into
account for scale specification :
Here is the list of the different mark functions currently available
in obsplot
:
mark_area
mark_areaX
mark_areaY
mark_barX
mark_barY
mark_cell
mark_cellX
mark_cellY
mark_dot
mark_dotX
mark_dotY
mark_frame
mark_function
mark_image
mark_line
mark_lineX
mark_lineY
mark_link
mark_rect
mark_rectX
mark_rectY
mark_ruleX
mark_ruleY
mark_svg
mark_text
mark_textX
mark_textY
mark_tickX
mark_tickY
To get a complete list of channels and options accepted or required
by the different available marks, take a look at the marks API
reference. For examples in obsplot
, see the marks
gallery.
Faceting
allows to create a grid of comparable grouped charts. In Observable Plot
faceting is used by adding a facet
option to
Plot.plot()
. In obsplot
it is achieved by
piping the facet
function.
Here, we create an horizontal set of scatterplots by passing an
x
channel to facet()
:
obsplot(penguins) |>
mark_dot(x = bill_length_mm, y = bill_depth_mm, stroke = sex) |>
facet(x = island)
To get a vertical faceting, define y
instead of
x
. We can also add a frame around each subchart by using
mark_frame()
:
obsplot(penguins) |>
mark_dot(x = bill_length_mm, y = bill_depth_mm, stroke = sex) |>
mark_frame() |>
facet(y = island)
Finally it is also possible to create a trellis of charts by
specifying both x
and y
.
obsplot(penguins) |>
mark_dot(x = bill_length_mm, y = bill_depth_mm, stroke = sex) |>
mark_frame() |>
facet(x = species, y = island)
For more information and examples on faceting and the available options, take a look at the facet options API reference and the facets section of the transforms gallery.
Scales is a family of functions which allow to modify the way a data value is mapped to a visual attribute such as position, size or color.
scale_color
scale_fx
scale_fy
scale_opacity
scale_r
scale_x
scale_y
Modifying scales in obsplot
is done by piping one of the
scale_
family of functions :
scale_x
and scale_y
allow to change the
x
and y
scalesscale_color
and scale_opacity
modify the
mappings on fill
, stroke
,
fillOpacity
and strokeOpacity
channelsscale_r
modifies the scale of the radius r
channelscale_fx
and scale_fy
are used to modify
the band scales added when using facetingFor example, we could modify the x
and y
scales to become logarithmic and change their labels:
metros$evo <- (metros$POP_2015 - metros$POP_1980) / metros$POP_1980
obsplot(metros) |>
mark_dot(x = POP_1980, y = POP_2015, stroke = evo) |>
scale_x(type = "log", label = "Population 1980") |>
scale_y(type = "log", label = "Population 2015")
Scales can also be used to specify a color palette, or even modify tick values with JavaScript code :
obsplot(metros) |>
mark_dot(x = POP_1980, y = POP_2015, stroke = evo) |>
scale_x(type = "log", label = "Pop 1980 (millions)", tickFormat = JS("d => d / 1000000")) |>
scale_y(type = "log", label = "Pop 2015 (millions)", tickFormat = JS("d => d / 1000000")) |>
scale_color(scheme = "viridis")
For a comprehensive list of scales arguments, see the scale options API reference.
Transforms are used to filter, modify or compute new data before plotting them.
Every mark allows to provide a set of basic transforms :
filter
, sort
and reverse
. Those
can be used by passing JavaScript code directly as argument to a mark
function :
obsplot(metros) |>
mark_dot(
x = POP_1980, y = POP_2015, stroke = "#D00",
filter = JS("d => d.POP_1980 > 2000000")
)
The transforms notebook provides more examples of these three transforms.
Transform functions are a set of functions which takes mark channels and options as input and compute a new set of channels and options. They are used, for example, to bin data to create an histogram, group them to compute a bar chart, etc.
In Observable Plot, transforms are functions (Plot.bin
,
Plot.windowX
…) passed as option to a mark. In
obsplot
, a corresponding transform function
(transform_bin()
, transform_windowX()
) is
called and passed as argument to a mark function.
For example, if we want to create an histogram, we have to apply
binning by calling transform_binX
inside a
mark_rectY
:
obsplot(penguins) |>
mark_rectY(
transform_binX(y = "count", x = bill_depth_mm)
)
Note that data columns can be passed as symbols
(bill_depth_mm
), but other arguments have to be character
strings ("count"
).
To create a cell chart of the cross tabulation of two categorical
variables, we have to apply a transform_group
before
calling mark_rect
and mark_text
:
obsplot(penguins) |>
mark_cell(
transform_group(fill = "count", x = island, y = species)
) |>
mark_text(
transform_group(text = "count", x = island, y = species)
) |>
scale_color(scheme = "PuRd")
Some transform functions take a specific first argument : either
outputs for transform_bin
,
transform_binX
, transform_binY
,
transform_group
, transform_groupX
,
transform_groupY
, transform_groupZ
,
transform_map
, or a map for
transform_mapX
and transform_mapY
. By default,
the first argument passed is considered as the unique output or map,
whereas the other ones are options. If you must specify several outputs,
or if an output has the same name as an option, wrap them into a
list()
:
obsplot(penguins) |>
mark_dot(y = species, x = body_mass_g) |>
mark_ruleY(
transform_groupY(
list(x1 = "min", x2 = "max"),
y = species, x = body_mass_g
)
) |>
mark_tickX(
transform_groupY(
list(x = "median"),
y = species, x = body_mass_g, stroke = "red"
)
) |>
scale_x(inset = 6) |>
scale_y(label = NULL)
Transforms can be composed, and you can store a transform in an R object and reuse it.
df <- data.frame(
index = 1:100,
value = rnorm(100)
)
xy <- transform_mapY("cumsum", y = value, x = index, k = 20)
obsplot(df) |>
mark_lineY(xy) |>
mark_lineY(
transform_windowY(xy), stroke = "red"
)
For more informations about transforms, see the transforms
notebook, the transforms API
reference and obsplot
transforms
gallery.
You can define global options such as layout
options or top-level options like grid
,
inset
, round
, etc. either directly in
obsplot()
or by piping the opts()
function
:
obsplot(metros) |>
mark_dot(
x = POP_1980, y = POP_2015, stroke = "#D00"
) |>
opts(grid = TRUE, marginLeft = 80, nice = TRUE)
opts
can also be used to add a caption :
Plot sizing can be specified by giving height
and
width
arguments in obsplot()
.
The default width
and height
value is
"auto"
: in this case height and width are computed by
htmlwidgets
and passed to Observable Plot, which should
give a plot adjusted to its container’s size :
obsplot(metros) |>
mark_tickX(x = POP_2015, strokeOpacity = .2)
By specifying height or width values, both Observable Plot and
htmlwidgets
will use these values :
obsplot(metros, height = 60) |>
mark_tickX(x = POP_2015, strokeOpacity = .2)
Finally, when height
and width
are set to
NULL
, the chart dimensions in pixels will be determined by
Observable Plot. Note that these dimensions may not be the same as the
HTML widget dimensions, which can produce big margins :
obsplot(penguins, height = NULL, width = NULL) |>
mark_dot(x = bill_length_mm, y = bill_depth_mm, stroke = sex)
When obsplot
is used in a Shiny app with a responsive
layout such as fluidPage
, it is recommended to use
"auto"
(the default) at least for width so that the chart
will redraw itself accordingly when its container is resized.
Style options allow to customize plot appearance via CSS rules. They
can be specified by piping the style()
function :
A “gear” menu can be added on the right side of the plots with
additional features such as SVG export. This can be enabled by
specifying menu = TRUE
:
You can also enable the gear menu globally in an R session, a Shiny app or an RMarkdown document with :
options("obsplot_menu" = TRUE)
Data conversion from R to JavaScript is handled by
htmlwidgets
via JSON serialization. As a general rule, a
data.frame in R is converted to a d3
style data array (an
array of objects), a list
in R is converted to an object, a
vector of size > 1 is converted to an array, and a vector of size 1
is converted to a number or character string.
obsplot
includes some helpers to automatically detect
when an object is of class Date
or POSIXt
, and
convert it to back a JavaScript Date
object.
There are several differences between obsplot
and
Observable Plot, mainly :
data
can be declared in obsplot()
and
inherited by the chart marks, whereas in Observable Plot it must be
declared for each mark.data
has been declared, an indexed data
argument of the same length is automatically added.as_data()
in obsplot
instead of []
in JavaScript.When the plotted data are stored in a data frame,
obsplot
has currently no way to determine which columns are
used or not. This is not a problem in an interactive session, but when
used in an RMarkdown document, the whole dataset will be embedded in the
output document in JSON format, which can make the document size go up
quickly.
One solution is to preselect the needed data in R before calling
obsplot
:
You can predefine transform argument in a list for reuse :
xy <- list(x = "island", y = "species")
obsplot(penguins, height = 100) |>
mark_cell(
transform_group(fill = "count", xy)
) |>
mark_text(
transform_group(text = "count", xy)
) |>
scale_color(scheme = "PuRd")
Note that in this case, all arguments including data column names must be passed as strings, not as symbols.
If you want to add new arguments to this predefined list, you’ll have
to use append
and put the new arguments themselves in a
list :
xy <- list(x = "island", y = "species")
obsplot(penguins, height = 100) |>
mark_cell(
transform_group(fill = count, xy)
) |>
mark_text(
transform_group(
text = "count",
append(
xy,
list(fill = "black", fontWeight = "bold", fontSize = 16, stroke = "#FFF")
)
)
) |>
scale_color(scheme = "PuRd")
To make interactive usage simpler, obsplot
allows to
pass column names as symbols instead of character strings.
If the symbol matches both a data column and an environment object, the data column has priority.
df <- data.frame(x = c("A", "B", "C"))
x <- 1:5
obsplot(df, height = 60) |>
mark_dotX(x = x)
Only single symbols can be used as data columns, any other type of expression will be evaluated in the current environment.
The same rules apply when symbols are used in
facet()
.
In transform
functions, data columns can also be passed
as symbols, but in these cases the rules are a bit different because the
transform doesn’t have a direct access to the data to check if the
symbol name is a column.
df <- data.frame(
v1 = rnorm(100)
)
obsplot(df, height = 120) |>
mark_rectY(
transform_binX(y = "count", x = v1)
)
min
, range
, etc.
df <- data.frame(
max = rnorm(100)
)
obsplot(df, height = 120) |>
mark_rectY(
transform_binX(y = "count", x = max)
)
rnd <- rnorm(100)
obsplot(df, height = 120) |>
mark_rectY(
transform_binX(y = "count", x = rnd)
)
What may be confusing here is that the priority is reversed regarding
mark
or facet
functions : if a symbol exists
with in the calling environment, it has priority over a data column of
the same name.
df <- data.frame(
x = rnorm(100)
)
x <- 1000:1100
obsplot(df, height = 120) |>
mark_rectY(
transform_binX(y = "count", x = x)
)
In this case you can use a character string instead of a symbol if you want to be sure that a channel will be seen as a data column.
obsplot(df, height = 120) |>
mark_rectY(
transform_binX(y = "count", x = "x")
)
JS()
When using JavaScript in obsplot
with JS()
,
both d3
and Plot
libraries are available. You
can then directly call d3 functions or
Plot formats
in your code.
obsplot() |>
mark_lineY(JS("d3.cumsum({length: 300}, d3.randomNormal())")) |>
scale_x(axis = NULL)