vignettes/introduction.Rmd
      introduction.Rmdobsplot is an R package that allows to use the Observable Plot
library to create charts as HTML widgets. Observable Plot is a free,
open-source JavaScript visualisation library developed by Mike Bostock and Philippe Rivière at Observable.
obsplot is still in an early stage, in particular its
API could change in the future, either for self improvements or to
follow Observable Plot evolutions. It may not be suitable for production
right now.
Also to be considered, obsplot is not suitable for
charting very large datasets : the generated plots are in SVG format,
and when using it in RMarkdown or Shiny the underlying data are included
in the output as JSON.
obsplot is not on CRAN yet, but can be installed from
Github with :
remotes::install_github("juba/obsplot")Or from R-universe with :
install.packages("obsplot", repos = "https://juba.r-universe.dev")Don’t forget to load the library with :
Suppose we want to create a very simple dot chart from the
penguins dataset of the palmerpenguins package
:
library(palmerpenguins)
data(penguins)To create such a chart we first initialise it with
obsplot(). We pass as argument the data frame containing
the data to plot :
obsplot(penguins)We then add a graphical mark to create our chart. Here we use the dot
mark by piping the mark_dot function. We pass as arguments
the x and y channels giving the
corresponding data frame columns :
Here we passed the data frame columns as symbols, but we can also use character strings instead :
We can add other channels, for example by changing dots color according to another variable :
We can also add constant options to a mark to modify an attribute in the same way for all dots :
We can also add global options to the chart with the
opts() function :
obsplot(penguins) |>
    mark_dot(x = bill_length_mm, y = bill_depth_mm, stroke = island, r = 2) |>
    opts(grid = TRUE)Finally, we can modify the way variables values are linked to graphical attributes by using scales function :
obsplot(penguins) |>
    mark_dot(x = bill_length_mm, y = bill_depth_mm, stroke = island, r = 2) |>
    scale_color(scheme = "set1") |>
    opts(grid = TRUE)To go a bit deeper, we have to take a look at the fundamental concepts of Observable Plot : marks, faceting, scales and transforms.
Marks are the fundamental building blocks of Observable Plot charts. Each mark is a graphical representation of some data under a specific form : dot, line, area, text…
In Observable Plot, marks are defined by giving a marks
JavaScript array to the Plot.plot() function. In
obsplot, it is done by piping one or more of the
mark_* family of functions. In the following example we add
three different marks to create a scatterplot with two rules for
x and y mean values :
mean_length <- mean(penguins$bill_length_mm, na.rm = TRUE)
mean_depth <- mean(penguins$bill_depth_mm, na.rm = TRUE)
obsplot(penguins) |>
    mark_ruleY(y = mean_depth) |>
    mark_ruleX(x = mean_length) |>
    mark_dot(x = bill_length_mm, y = bill_depth_mm)A mark function takes several arguments. The first one is an optional
data object. If not specified, it is inherited from the one
passed to obsplot. Other named arguments are called mark
constructors and can be of several types :
data, as a string ("col") or a symbol
(col)JS() function,
evaluated at runtimeIn the following example, both x and y are
column channels, whereas stroke is a constant. In fact
values passed to a color constructor (stroke or
fill) are automatically considered as constant if they look
like a CSS color name or definition.
If we want to highlight some points by adding a text label, we can do
it by giving a specific data argument to
mark_text :
metros_10m <- subset(metros, POP_2015 > 10000000)
obsplot(metros) |>
  mark_dot(x = POP_1980, y = POP_2015, stroke = "#D00") |>
  mark_text(metros_10m, x = POP_1980, y = POP_2015, text = nyt_display, dy = -10)We can also use JavaScript code. For example, we can use accessors to convert population values to million of people :
obsplot(metros) |>
  mark_dot(
    x = JS("d => d.POP_1980 / 1000000"),
    y = JS("d => d.POP_2015 / 1000000"),
    stroke = "#D00"
  )We can also provide data directly to one of the channels (in
Observable Plot, you can do it only by specifying a corresponding
indexed data argument of the same length, this is done
automatically in obsplot) :
obsplot() |>
  mark_lineY(y = cumsum(rnorm(100))) |>
  mark_ruleY(0)The rules to determine a channel type are as follows (this may be subject to change in the future):
JS(), it is JavaScript coder,
strikeOpacity,fillOpacity,
fontSize and rotate
fill and
stroke
You can explicitly specify that a channel is a vector channel by
using the as_data() helper function. In the following
example, without as_data the code would raise an error as
it would look for a "Paris" column in the data :
obsplot(metros) |>
  mark_dot(x = POP_1980, y = POP_2015) |>
  mark_dot(x = 9000000, y = 10600000, stroke = "red") |>
  mark_text(x = 9000000, y = 10600000, text = as_data("Paris"), dy = -10)When a column or vector channel is of type Date or
POSIXt in R, it is automatically converted to
Date in JavaScript, and Observable Plot will take it into
account for scale specification :
Here is the list of the different mark functions currently available
in obsplot :
mark_areamark_areaXmark_areaYmark_barXmark_barYmark_cellmark_cellXmark_cellYmark_dotmark_dotXmark_dotYmark_framemark_functionmark_imagemark_linemark_lineXmark_lineYmark_linkmark_rectmark_rectXmark_rectYmark_ruleXmark_ruleYmark_svgmark_textmark_textXmark_textYmark_tickXmark_tickYTo get a complete list of channels and options accepted or required
by the different available marks, take a look at the marks API
reference. For examples in obsplot, see the marks
gallery.
Faceting
allows to create a grid of comparable grouped charts. In Observable Plot
faceting is used by adding a facet option to
Plot.plot(). In obsplot it is achieved by
piping the facet function.
Here, we create an horizontal set of scatterplots by passing an
x channel to facet() :
obsplot(penguins) |>
  mark_dot(x = bill_length_mm, y = bill_depth_mm, stroke = sex) |>
  facet(x = island)To get a vertical faceting, define y instead of
x. We can also add a frame around each subchart by using
mark_frame() :
obsplot(penguins) |>
  mark_dot(x = bill_length_mm, y = bill_depth_mm, stroke = sex) |>
  mark_frame() |>
  facet(y = island)Finally it is also possible to create a trellis of charts by
specifying both x and y.
obsplot(penguins) |>
  mark_dot(x = bill_length_mm, y = bill_depth_mm, stroke = sex) |>
  mark_frame() |>
  facet(x = species, y = island)For more information and examples on faceting and the available options, take a look at the facet options API reference and the facets section of the transforms gallery.
Scales is a family of functions which allow to modify the way a data value is mapped to a visual attribute such as position, size or color.
scale_colorscale_fxscale_fyscale_opacityscale_rscale_xscale_yModifying scales in obsplot is done by piping one of the
scale_ family of functions :
scale_x and scale_y allow to change the
x and y scalesscale_color and scale_opacity modify the
mappings on fill, stroke,
fillOpacity and strokeOpacity channelsscale_r modifies the scale of the radius r
channelscale_fx and scale_fy are used to modify
the band scales added when using facetingFor example, we could modify the x and y
scales to become logarithmic and change their labels:
metros$evo <- (metros$POP_2015 - metros$POP_1980) / metros$POP_1980
obsplot(metros) |>
  mark_dot(x = POP_1980, y = POP_2015, stroke = evo) |>
  scale_x(type = "log", label = "Population 1980") |>
  scale_y(type = "log", label = "Population 2015")Scales can also be used to specify a color palette, or even modify tick values with JavaScript code :
obsplot(metros) |>
  mark_dot(x = POP_1980, y = POP_2015, stroke = evo) |>
  scale_x(type = "log", label = "Pop 1980 (millions)", tickFormat = JS("d => d / 1000000")) |>
  scale_y(type = "log", label = "Pop 2015 (millions)", tickFormat = JS("d => d / 1000000")) |>
  scale_color(scheme = "viridis")For a comprehensive list of scales arguments, see the scale options API reference.
Transforms are used to filter, modify or compute new data before plotting them.
Every mark allows to provide a set of basic transforms :
filter, sort and reverse. Those
can be used by passing JavaScript code directly as argument to a mark
function :
obsplot(metros) |>
  mark_dot(
    x = POP_1980, y = POP_2015, stroke = "#D00",
    filter = JS("d => d.POP_1980 > 2000000")
  )The transforms notebook provides more examples of these three transforms.
Transform functions are a set of functions which takes mark channels and options as input and compute a new set of channels and options. They are used, for example, to bin data to create an histogram, group them to compute a bar chart, etc.
In Observable Plot, transforms are functions (Plot.bin,
Plot.windowX…) passed as option to a mark. In
obsplot, a corresponding transform function
(transform_bin(), transform_windowX()) is
called and passed as argument to a mark function.
For example, if we want to create an histogram, we have to apply
binning by calling transform_binX inside a
mark_rectY :
obsplot(penguins) |>
  mark_rectY(
    transform_binX(y = "count", x = bill_depth_mm)
  )Note that data columns can be passed as symbols
(bill_depth_mm), but other arguments have to be character
strings ("count").
To create a cell chart of the cross tabulation of two categorical
variables, we have to apply a transform_group before
calling mark_rect and mark_text :
obsplot(penguins) |>
  mark_cell(
    transform_group(fill = "count", x = island, y = species)
  ) |>
  mark_text(
    transform_group(text = "count", x = island, y = species)
  ) |>
  scale_color(scheme = "PuRd")Some transform functions take a specific first argument : either
outputs for transform_bin,
transform_binX, transform_binY,
transform_group, transform_groupX,
transform_groupY, transform_groupZ,
transform_map, or a map for
transform_mapX and transform_mapY. By default,
the first argument passed is considered as the unique output or map,
whereas the other ones are options. If you must specify several outputs,
or if an output has the same name as an option, wrap them into a
list() :
obsplot(penguins) |>
    mark_dot(y = species, x = body_mass_g) |>
    mark_ruleY(
        transform_groupY(
          list(x1 = "min", x2 = "max"),
          y = species, x = body_mass_g
        )
    ) |>
    mark_tickX(
      transform_groupY(
        list(x = "median"),
        y = species, x = body_mass_g, stroke = "red"
      )
    ) |>
    scale_x(inset = 6) |>
    scale_y(label = NULL)Transforms can be composed, and you can store a transform in an R object and reuse it.
df <- data.frame(
  index = 1:100,
  value = rnorm(100)
)
xy <- transform_mapY("cumsum", y = value, x = index, k = 20)
obsplot(df) |>
  mark_lineY(xy) |>
  mark_lineY(
    transform_windowY(xy), stroke = "red"
  )For more informations about transforms, see the transforms
notebook, the transforms API
reference and obsplot transforms
gallery.
You can define global options such as layout
options or top-level options like grid,
inset, round, etc. either directly in
obsplot() or by piping the opts() function
:
obsplot(metros) |>
  mark_dot(
    x = POP_1980, y = POP_2015, stroke = "#D00"
  ) |>
  opts(grid = TRUE, marginLeft = 80, nice = TRUE)opts can also be used to add a caption :
Plot sizing can be specified by giving height and
width arguments in obsplot().
The default width and height value is
"auto" : in this case height and width are computed by
htmlwidgets and passed to Observable Plot, which should
give a plot adjusted to its container’s size :
obsplot(metros) |>
  mark_tickX(x = POP_2015, strokeOpacity = .2)By specifying height or width values, both Observable Plot and
htmlwidgets will use these values :
obsplot(metros, height = 60) |>
  mark_tickX(x = POP_2015, strokeOpacity = .2)Finally, when height and width are set to
NULL, the chart dimensions in pixels will be determined by
Observable Plot. Note that these dimensions may not be the same as the
HTML widget dimensions, which can produce big margins :
obsplot(penguins, height = NULL, width = NULL) |>
  mark_dot(x = bill_length_mm, y = bill_depth_mm, stroke = sex)When obsplot is used in a Shiny app with a responsive
layout such as fluidPage, it is recommended to use
"auto" (the default) at least for width so that the chart
will redraw itself accordingly when its container is resized.
Style options allow to customize plot appearance via CSS rules. They
can be specified by piping the style() function :
A “gear” menu can be added on the right side of the plots with
additional features such as SVG export. This can be enabled by
specifying menu = TRUE :
You can also enable the gear menu globally in an R session, a Shiny app or an RMarkdown document with :
options("obsplot_menu" = TRUE)Data conversion from R to JavaScript is handled by
htmlwidgets via JSON serialization. As a general rule, a
data.frame in R is converted to a d3 style data array (an
array of objects), a list in R is converted to an object, a
vector of size > 1 is converted to an array, and a vector of size 1
is converted to a number or character string.
obsplot includes some helpers to automatically detect
when an object is of class Date or POSIXt, and
convert it to back a JavaScript Date object.
There are several differences between obsplot and
Observable Plot, mainly :
data can be declared in obsplot() and
inherited by the chart marks, whereas in Observable Plot it must be
declared for each mark.data has been declared, an indexed data
argument of the same length is automatically added.as_data() in obsplot
instead of [] in JavaScript.When the plotted data are stored in a data frame,
obsplot has currently no way to determine which columns are
used or not. This is not a problem in an interactive session, but when
used in an RMarkdown document, the whole dataset will be embedded in the
output document in JSON format, which can make the document size go up
quickly.
One solution is to preselect the needed data in R before calling
obsplot :
You can predefine transform argument in a list for reuse :
xy <- list(x = "island", y = "species")
obsplot(penguins, height = 100) |>
  mark_cell(
    transform_group(fill = "count", xy)
  ) |>
  mark_text(
    transform_group(text = "count", xy)
  ) |>
  scale_color(scheme = "PuRd")Note that in this case, all arguments including data column names must be passed as strings, not as symbols.
If you want to add new arguments to this predefined list, you’ll have
to use append and put the new arguments themselves in a
list :
xy <- list(x = "island", y = "species")
obsplot(penguins, height = 100) |>
  mark_cell(
    transform_group(fill = count, xy)
  ) |>
  mark_text(
    transform_group(
      text = "count",
      append(
        xy,
        list(fill = "black", fontWeight = "bold", fontSize = 16, stroke = "#FFF")
      )
    )
  ) |>
  scale_color(scheme = "PuRd")To make interactive usage simpler, obsplot allows to
pass column names as symbols instead of character strings.
If the symbol matches both a data column and an environment object, the data column has priority.
df <- data.frame(x = c("A", "B", "C"))
x <- 1:5
obsplot(df, height = 60) |>
  mark_dotX(x = x)Only single symbols can be used as data columns, any other type of expression will be evaluated in the current environment.
The same rules apply when symbols are used in
facet().
In transform functions, data columns can also be passed
as symbols, but in these cases the rules are a bit different because the
transform doesn’t have a direct access to the data to check if the
symbol name is a column.
df <- data.frame(
  v1 = rnorm(100)
)
obsplot(df, height = 120) |>
  mark_rectY(
    transform_binX(y = "count", x = v1)
  )min, range, etc.
df <- data.frame(
  max = rnorm(100)
)
obsplot(df, height = 120) |>
  mark_rectY(
    transform_binX(y = "count", x = max)
  )
rnd <- rnorm(100)
obsplot(df, height = 120) |>
  mark_rectY(
    transform_binX(y = "count", x = rnd)
  )What may be confusing here is that the priority is reversed regarding
mark or facet functions : if a symbol exists
with in the calling environment, it has priority over a data column of
the same name.
df <- data.frame(
  x = rnorm(100)
)
x <- 1000:1100
obsplot(df, height = 120) |>
  mark_rectY(
    transform_binX(y = "count", x = x)
  )In this case you can use a character string instead of a symbol if you want to be sure that a channel will be seen as a data column.
obsplot(df, height = 120) |>
  mark_rectY(
    transform_binX(y = "count", x = "x")
  )JS()
When using JavaScript in obsplot with JS(),
both d3 and Plot libraries are available. You
can then directly call d3 functions or
Plot formats
in your code.
obsplot() |>
    mark_lineY(JS("d3.cumsum({length: 300}, d3.randomNormal())")) |>
    scale_x(axis = NULL)