Functions 2
In this session, we will discuss:
{{}}
for <data-masking>
functionsFor coding, we will use r-programming-exercises
:
R/functions-02-01-embrace.R
, etc.Sometimes, you want to generalize a certain type of plot - letβs say a histogram:
diamonds |>
ggplot(aes(x = carat)) +
geom_histogram(binwidth = 0.1)
diamonds |>
ggplot(aes(x = carat)) +
geom_histogram(binwidth = 0.05)
βI want choose only the data, variable, and bin-widthβ
aes()
is a data-masking function; you can embrace π€
<data-masking>
in helphistogram <- function(df, var, binwidth = NULL) {
df |>
ggplot(aes(x = {{ var }})) +
geom_histogram(binwidth = binwidth)
}
histogram(diamonds, carat, 0.1)
Complete this function yourself:
histogram <- function(df, var, binwidth = NULL) {
df |>
ggplot(aes()) +
geom_histogram(binwidth = binwidth)
}
Try with other df
and var
, e.g. starwars
, mtcars
.
Using this function as is, how can you:
add theme_minimal()
?
fill the bars with "steelblue"
?
Adding a theme: histogram()
returns a ggplot
object, so you can add a theme in the βusualβ way:
histogram(starwars, height) + theme_minimal()
As is, there is no easy way to specify "steelblue"
.
However, you can build an escape hatch.
...
are called βdot-dot-dotβ or βdotsβ.
# `...` passed to `geom_histogram()`
histogram <- function(df, var, ..., binwidth = NULL) {
df |>
ggplot(aes(x = {{var}})) +
geom_histogram(binwidth = binwidth, ...)
}
Passes unspecified arguments from your function to another (tell your users where).
Tidyverse Design Guide has more details.
Incorporate dot-dot-dot into histogram()
:
histogram <- function(df, var, ..., binwidth = NULL) {
df |>
ggplot(aes(x = {{var}})) +
geom_histogram(binwidth = binwidth, ...)
}
Try, e.g.:
histogram(starwars, height, binwidth = 5, fill = "steelblue")
You write a function to make some things easier.
The cost is that some things become more difficult.
This is unavoidable, the best you can do is be deliberate about what you make easier and more difficult.
How to build a string, using variable-names and values?
rlang::englue()
was built for this purpose:
{{}}
{}
temp <- function(varname, value) {
rlang::englue("You chose varname: {{ varname }} and value: {value}")
}
temp(val, 0.4)
[1] "You chose varname: val and value: 0.4"
Adapt histogram()
to include a title that describes:
var
is binned, and the binwidth
histogram <- function(df, var, ..., binwidth = NULL) {
df |>
ggplot(aes(x = {{ var }})) +
geom_histogram(binwidth = binwidth, ...) +
labs(
title = rlang::englue("")
)
}
Try:
histogram(starwars, height, binwidth = 5)
histogram(starwars, height) # "extra credit"
histogram <- function(df, var, ..., binwidth = NULL) {
df |>
ggplot(aes(x = {{ var }})) +
geom_histogram(binwidth = binwidth, ...) +
labs(
title = rlang::englue(
"Histogram of {{ var }}, with binwidth {binwidth %||% 'default'}"
)
)
}
histogram(starwars, height, binwidth = 5)
Your function can also include pre-processing of data.
Your function can also include pre-processing of data.
sorted_bars <- function(df, var) {
df |>
mutate({{ var }} := {{ var }} |> fct_infreq() |> fct_rev()) |>
ggplot(aes(y = {{ var }})) +
geom_bar()
}
If using {{ }}
to specify a new column, use :=
, not =
.
fct_infreq()
reorders by decreasing frequency.
fct_rev()
reverses order, as y-axis starts at bottom.
Our turn: letβs try it
use embracing {{}}
to interpolate bare column-names
<data-masking>
in the helprlang::englue()
to interpolate variables {{}}
and values {}
...
is a useful βescape hatchβ in function design:
Restart R, open functions-02-02-style.R
Use descriptive name, usually starts with a verb, unless it returns a well-known noun.
required: arguments without default values
dots: can be passed on functions that your function calls
optional: arguments with default values
Tidyverse Design:
Our histogram function:
histogram <- function(df, var, ..., binwidth = NULL) {
df |>
ggplot(aes(x = {{var}})) +
geom_histogram(binwidth = binwidth, ...)
}
required: df
, var
dots: ...
optional: binwidth
histogram <- function(df, var, ..., binwidth = NULL) {
df |>
ggplot(aes(x = {{ var }})) +
geom_histogram(binwidth = binwidth, ...)
}
Why optional after dots?
user must name optional arguments, in this case binwidth
.
makes code easier to read when optional arguments used.
more reasoning given in the Tidyverse design guide.
When we write filter()
, do we meanβ¦
Three ways to sort this out:
library("conflicted")
, suitable for R scripts
package::function()
, used in package functions
#' @importFrom
, also used (sparingly) in packages
conflicted lets you know when you use a function that exists two-or-more packages that youβve loaded.
To avoid conflicts, declare a preference:
# put in a conspicuous place, near the top of your script
conflicts_prefer(dplyr::filter)
In functions-02-02-style.R
:
library("tidyverse")
mtcars |> filter(cyl == 6)
library("conflicted")
, run againconflicts_prefer()
directivepackage::function()
This is the usual way when writing a function for a package:
histogram <- function(df, var, ..., binwidth = NULL) {
df |>
ggplot2::ggplot(ggplot2::aes(x = {{ var }})) +
ggplot2::geom_histogram(binwidth = binwidth, ...)
}
There is a balance to be struck.
#' @importFrom
When you have a lot of calls to a given external function
Put this in {packagename}-package.R
:
#' @importFrom ggplot2 ggplot aes geom_histogram
NULL
Alternatively, from the R command prompt:
usethis::use_import_from("ggplot2", c("ggplot", "aes", "geom_histogram"))
#' @importFrom
histogram <- function(df, var, ..., binwidth = NULL) {
df |>
ggplot(aes(x = {{ var }})) +
geom_histogram(binwidth = binwidth, ...)
}
Makes your code less verbose, but also less transparent
To mitigate:
@importFrom
in one conspicuous file: {packagename}-package.R
Also, look at tidyverse code at GitHub (my favorite is {usethis})
Restart R, open functions-02-03-side-effects.R
Uses side effects:
Depends on something other than inputs, e.g. read.csv()
Or, makes a change in the environment, e.g. print()
add <- function(x, y) {
x + y
}
The return value depends only on the inputs.
Easier to test.
Side-effects can slow down your function:
Depending on side effects can introduce uncertainty:
file.csv
contains?Side effects arenβt necessarily bad, but you need to take them into account:
Discuss with your neighbor, are these function-calls are pure, or do they use side effects?
In functions-02-03-side-effects.R
:
Can be useful to consult devtools::session_info()
:
# using `info = "platform"` to fit output on screen
devtools::session_info(info = "platform")
β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
setting value
version R version 4.3.1 (2023-06-16)
os Ubuntu 22.04.3 LTS
system x86_64, linux-gnu
ui X11
language (EN)
collate C.UTF-8
ctype C.UTF-8
tz UTC
date 2023-09-18
pandoc 3.1.1 @ /opt/quarto/bin/tools/ (via rmarkdown)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Side effects can include:
Sys.setenv()
options()
set.seed()
setwd()
{withr} makes it a lot easier to βleave no footprintsβ.
sort()
uses locale (environment) for string-sorting rules
(temp <- Sys.getlocale("LC_COLLATE"))
## [1] "C.UTF-8"
sort(c("apple", "Banana", "candle"))
## [1] "apple" "Banana" "candle"
Sys.setlocale("LC_COLLATE", "C")
## [1] "C"
sort(c("apple", "Banana", "candle"))
## [1] "Banana" "apple" "candle"
Sys.setlocale("LC_COLLATE", temp)
## [1] "C.UTF-8"
To temporarily set locale:
withr::with_locale(
new = list(LC_COLLATE = "C"),
sort(c("apple", "Banana", "candle"))
)
## [1] "Banana" "apple" "candle"
Sys.getlocale("LC_COLLATE")
## [1] "C.UTF-8"
c_sort <- function(...) {
# set only within function block
withr::local_locale(list(LC_COLLATE = "C"))
sort(...)
}
c_sort(c("apple", "Banana", "candle"))
## [1] "Banana" "apple" "candle"
Sys.getlocale("LC_COLLATE")
## [1] "C.UTF-8"
Within curly brackets applies to function blocks, it also applies to {testthat} blocks.
?dplyr::arrange()
arrange()
uses the "C"
locale by defaultlibrary("testthat")
test_that("mtcars has expected columns", {
expect_type(mtcars$cy, "double")
})
Test passed π
This passes, but R is doing partial matching on the $
.
Modify test_that()
block to warn on partial matching.
You can get the current setting using:
getOption("warnPartialMatchDollar")
[1] FALSE
Hint: use withr::local_option()
.
test_that("mtcars has expected columns", {
withr::local_options(list(warnPartialMatchDollar = TRUE))
expect_type(mtcars$cy, "double")
})
ββ Warning ('<text>:5:3'): mtcars has expected columns βββββββββββββββββββββββββ
partial match of 'cy' to 'cyl'
Backtrace:
1. testthat::expect_type(mtcars$cy, "double")
2. testthat::quasi_label(enquo(object), arg = "object")
3. rlang::eval_bare(expr, quo_get_env(quo))
And yetβ¦
getOption("warnPartialMatchDollar")
[1] FALSE
You can use tidy evaluation in {ggplot2} to specify aesthetics, add labels, and include {dplyr} preprocessing:
{{}}
for <data-masking>
functions<tidy-select>
functions, work differentlyUsing tidyverse style and design can make things easier for you, your users, and future you.
Be mindful of side effects, use {withr} to manage global state.