In the ideal case all it takes to start using futures in R is to replace any standard assignment (<-
) in your R code with a future assignment (%<=%
) and make sure your right-hand side (RHS) expression is within curly brackets ({ ... }
). Also, if you assign these to lists (e.g. in a for loop), you need to use a list environment (listenv
) instead of a plain list.
However, there are few cases where you have to take extra precautions. These are often related to how global variables are false identified in non-standard evaluation, e.g. subset(data, x < 3)
. Global variables need to be identified when futures are created, but this is a particularly hard task when non-standard evaluation is involved.
If you identify other use cases, please consider reporting them so they can be documented here and possibly even be fixed.
Consider the following use of subset()
:
> data <- data.frame(x=1:5, y=1:5)
> v <- subset(data, x < 3)$y
> v
[1] 1 2
From a static code inspection point of view, the expression x < 3
asks for variable x
to be compared to 3, and there is nothing specifying that x
is part of data
and not the global environment. That x
is indeed part of the data
object can only safely be inferred at run time when subset()
is called. This is not a problem in the above snipped, but when using futures all global/unknown variables need to be captured when the future is created (it is too late to do it when the future is resolved), e.g.
> library(future)
> plan(lazy)
> data <- data.frame(x=1:5, y=1:5)
> v %<=% subset(data, x < 3)$y
Error in globalsOf(expr, envir = envir, tweak = tweakExpression, dotdotdot = "return", :
Identified a global by static code inspection, but failed to locate the corresponding
object in the relevant environments: 'x'
Above, code inspection of the future expression subset(data, x < 3)$y
incorrectly identifies x
as a global variables that needs to be captured (“frozen”) for the (lazy) future. Since no such variable x
exists, we get an error.
The same error would be reported when using plan(eager, globals=TRUE)
or plan(multicore, globals=TRUE)
, which validates globals before the future is created.
The most clear and backward-compatible solution to this problem is to explicitly specify the context of x
, i.e.
> data <- data.frame(x=1:5, y=1:5)
> v %<=% subset(data, data$x < 3)$y
> v
[1] 1 2
An alternative is to use a dummy variable. In contrast to the code-inspection algorithm used to identify globals, we know from reading the documentation that subset()
will look for x
in the data
object, not in the parent environments. Armed with this knowledge, we can trick the future package (more precisely the globals package) to pickup a dummy variable x
instead, e.g.
> data <- data.frame(x=1:5, y=1:5)
> x <- NULL ## To please future et al.
> v %<=% subset(data, x < 3)$y
> v
[1] 1 2
Another common use case for non-standard evaluation is when creating ggplot2 figures. For instance, in
> library(ggplot2)
> p <- ggplot(mtcars, aes(wt, mpg)) + geom_point()
> p
fields mpg
and wt
of the mtcars
data object are plotted against each other. That mpg
and wt
are actually fields of mtcars
can not be inferred from code inspection alone, but you need know that that is how ggplot2 works. Analogously to the above subset()
example, this explains why we get the following error:
> library(future)
> plan(lazy)
> library(ggplot2)
> p %<=% { ggplot(mtcars, aes(wt, mpg)) + geom_point() }
Error in globalsOf(expr, envir = envir, tweak = tweakExpression, dotdotdot = "return", :
Identified a global by static code inspection, but failed to locate the corresponding
object in the relevant environments: 'wt'
A few comments are needed here. First of all, because %<=%
has higher precedence than +
, we need to place all of the ggplot2 expression within curly brackets, otherwise we get an error. Second, the reason for only wt
being listed as a missing global variable and not mpg
is because the latter is (incorrectly) located to be ggplot2::mpg
.
One workaround is to make use of the *_string()
functions of ggplot2, e.g.
> p %<=% { ggplot(mtcars, aes_string('wt', 'mpg')) + geom_point() }
> p
Another one, is to explicitly specify mtcars$wt
and mtcars$mpg
, which may become a bit tedious. A third alternative is to make use of dummy variables wt
and mpg
, i.e.
> p %<=% {
+ wt <- mpg <- NULL
+ ggplot(mtcars, aes(wt, mpg)) + geom_point()
+ }
> p
By the way, since all futures are evaluated in a local environment, the dummy variables are not assigned to the calling environment.
In plain R, it is straightforward to understand what value is assigned to x
:
> a <- 1
> x <- { y <- 2*a; a <- 2; a*y }
> rm(a)
> x
[1] 4
If we analogously use a future assignment, we instead get:
> library(future)
> plan(lazy)
> a <- 1
> x %<=% { y <- 2*a; a <- 2; a*y }
> rm(a)
> x
Error in eval(expr, envir, enclos) : object 'a' not found
(For lazy futures it can even be worse because if we don't remove a
but instead reassign it, say, a <- 3
, we will get an incorrect value x == 6
, because y <- a
is resolved at the time when the lazy future is resolved.)
The reason for this problem is that the static code inspection fails to identify that the first occurrence of a
is a global variable whereas the second occurrence creates a local variable with the same name. The latter “hides” the former so it fails to be identified. This is a limitation of the globals and [codetools] packages. In order to avoid this problem, it useful to understand that from the code inspector's point of view the order of expressions within futures makes no difference, that is, y <- 2*a; a <- 2
and a <- 2; y <- 2*a
are identical when it comes to identifying globals.
The solution to this problem to avoid using local variables with the same name as global ones. For example,
> library(future)
> plan(lazy)
> a <- 1
> x %<=% { y <- 2*a; b <- 2; b*y }
> rm(a)
> x
[1] 4
The future assignment operator %<=%
is a binary infix operator, which means it has higher precedence than most other binary operators but also higher than some of the unary operators in R. For instance, this explains why we get the following error:
> x %<=% 2 * runif(1)
Error in x %<=% 2 * runif(1) : non-numeric argument to binary operator
What effectively is happening here is that because of the higher priority of %<=%
, we first create a future x %<=% 2
and then we try to multiply it (not its value) with the value of runif(1)
- which makes no sense. In order to properly assign the future variable, we need need to put the future expression within curly brackets;
> x %<=% { 2 * runif(1) }
> x
[1] 1.030209
Parentheses will also do. For details on precedence on operators in R, see Section 'Infix and prefix operators' in the 'R Language Definition' document.
Copyright Henrik Bengtsson, 2015