Debugging R with restore points

2013-08-28

Author: Sebastian Kranz

Email: sebastian.kranz@uni-ulm.de

Project website: https://github.com/skranz/restorepoint

Abstract:

The package restorepoint allows to debug R functions via restore points instead of break points. When called inside a function, a restore point stores all local variables. These can be restored for later debugging purposes by simply copy & pasting the body of the function from the source code editor to the R console. This vignette briefly illustrates the use of restore points and compares advantages and drawbacks compared to the traditional method of setting break points via browser(). Restore points are particularly convenient when using an IDE like RStudio that allows to quickly run selected code from a script in the R Console.

Installation of newest version

To install the newest version of the package from github first install and load the devtools package and then type in the R console:

install_github(repo="restorepoint", username="skranz")

To install the (typically older) CRAN version of restorepoint, type

install.packages("restorepoint")

A simple example of debugging with restore points

Consider a function swap.in.vector that shall split a vector at a given position and then swap the left and right part of the vector. Here is an example of a call to a correct implementation:

swap.in.vector(1:5,3)
## [1] 3 4 5 1 2

Here is a faulty implementation that we want to debug:

library(restorepoint)
swap.in.vector = function(vec, swap.ind) {
    restore.point("swap.in.vector", to.global = FALSE)
    left = vec[1:(swap.ind - 1)]
    right = vec[swap.ind:nrow(vec)]
    c(right, left)
}
swap.in.vector(1:10, 4)
## Error: argument of length 0

The first line in the function specifies a restore point. The behavior of restore.point depends whether it is called inside a function or directly pasted in the console.

restore.point called inside a function

When restore.point(name) is called inside a function, it stores the current values of all local variables under the specified name. In the example, these local variable are vec and swap.ind and the name is ``swap.in.vector''.

restore.point is called directly in the R console

When restore.point("swap.vector", to.global=FALSE) is called directly in the R console the following happens:

In effect, we can now debug the function by simply running line by line the source code from the function body inside the R console. If you don't use RStudio, you have to copy each line (or several lines at the same time) from your script and paste them in the R console.

With RStudio one can do that in a very convenient fasion. Just select in the source code window the lines inside the function starting with the first line that calls restore.point and press Ctrl-Enter. The marked lines will then be automatically run in the R console.

Since the call to restore.point restores the local variables to the state they have been in when restore.point has earlier been called during the function evaluation, the pasted lines of the function code (typically) behave in the same fashion as they did when the function was called the last time. We can inspect the variables and code by simply typing any desired command into the R console.

  restore.point("swap.in.vector", to.global=FALSE)
    left  = vec[1:(swap.ind-1)]
    right = vec[swap.ind:nrow(vec)]
## Error: argument of length 0
    c(right,left)
## Error: object 'right' not found

The error occurred in the third line. We can inspect the variables in more detail to narrow down the error.

swap.ind
## [1] 4
vec
##  [1]  1  2  3  4  5  6  7  8  9 10
swap.ind:nrow(vec)
## Error: argument of length 0
nrow(vec)
## NULL

There is the culprit. The command nrow returns NULL for a vector. We want to use length(vec) or NROW(vec) instead.

# Try an alternative formulation
length(vec)
## [1] 10

We can correct the code in our script and directly test it by pasting again the whole function body. There is no need to call the function again, since the parameter from the previous function call are still stored under the name ``swap.in.vector''.

Test the inside of the function by copy & paste it into the R console.

restore.point("swap.in.vector", to.global = FALSE)
left = vec[1:(swap.ind - 1)]
right = vec[swap.ind:length(vec)]
c(right, left)
##  [1]  4  5  6  7  8  9 10  1  2  3

The corrected function seems to work fine so far (indeed there is an error left that we remove in Section 3). Pressing ESC returns to the normal evaluation mode of the R Console.

Why I prefer restore points over break points

The standard method to debug a function is to set break points, yet in my experience restore point are much more convenient. In R break points can be set via a call to browser() inside the function. When during execution of the function, browser() is called, the R console immediately changes into an interactive debugging mode that allows to step through the code and enter any R expressions. In contrast, when restore.point is called inside the function there are no direct visible effects: the debugging mode starts afterward, when we decide to paste parts of the function source into the R console.

I prefer restore points over break points, mainly for the following reasons:

  1. When debugging nested function calls, handling several break points can become very tedious, since the program flow is interrupted with every break point. Despite using traceback(), it is often not clear where exactly the error has occured. As a consequence, I tend to set too many break points and the program flow is interrupted too often.

  2. When I want to turn off invocation of the browser, I have to comment out #browser() and source again the function body. That can become quite tedious. When using restore points, I typically just keep the calls to restore.point in the code even if I may not need them at the moment. Calls to restore.point are simply not very obtrusive. They just make silently a copy of the data. While there is some memory overhead and execution may slow down a bit, but usually I find that negligible. I basically have a call to restore.point in every function, which allows me to always find out, step by step what has happened the last time some function was called.

  3. I often would like to restart from the break point after I changed something in the function, to test whether the new code works. But with nested function calls, e.g. inside an optimization procedure, for which an error only occurred under certain parameter constellations, it can sometimes be quite time consuming until the state in which the error has occurred is reached again. This problem does not arise for restore points: I can always restart at the restore point and test my modified function code for the function call that caused the error.

  4. The interactive browser used by browser() has a own set of command, e.q. pressing "Q" quits the browser or pressing "n" debugs the next function. For that reason, one cannot always simply copy & paste R code into the browser. (E.g. if you ) In contrast, the only special key in the debug mode of restore point is Escape, which brings you back to the standard R console. The restore point browser makes debugging via copy & paste from your R script (or in RStudio, select code and press CTRL+Enter) much easier.

  5. One is automatically thrown out the debugging mode of browser() once a line with an error is pasted. This does not happen in the restore point browser. I find it much more convenient to stay in the debug mode. It allows me to paste all the code until an error has occurred and to check only afterward the values of local expressions.

The restore point console vs restoring into global environment

The example above used the call to restore.point with the option to.global=FALSE, which has the effect that future commands are evaluated in the restore point console. The main difference to the standard R console is that expressions are not evaluated in the global environment, but in an environment that emulates the local envirnment of the function that we want to debug. However, by default we have to.global = TRUE and debuggin takes a much simpler but quite dirty form. All stored objects are just copied into the global environment and the usual R console stays in place. You can test the example (make sure you have left the restore point console by pressing Esc).

  # If to.global=TRUE or not set, objects are restored into the
  # the global environment
  restore.point("swap.in.vector")
## Restored: swap.ind,vec
    left  = vec[1:(swap.ind-1)]
  #...

While this approach is quite dirty, I often prefer it for beeing slightly more convenient. Here are some disadvantages of the restore point console that make me often prefer the global environment approach.

On the other hand, there are several advantages of using the restore point console instead of simply copying the variables into the global environment.

Even though the points in favor of just using the global environment may seem small, I nevertheless typically prefer that dirty approach in my workflow and therefore made it the default.

Here are some more points that seem noteworthy.

Some advice and examples on using restore.points

When to set restore points

When writing a new function, I tend to always add a restore point in the first line, with name equal to the function name.

my.fun = function(par1, par2 = 0) {
    restore.point("my.fun")
    # ... code here ...
}

Unlike break points (see discussion below), restore points don't interrupt program execution. Even though most errors are found quickly, there are also often errors that remain hidden for a while. Therefore having restore points in all functions can be quite convenient, in particular in complex code: One is always prepared to debug.

One does not have to set restore points at the beginning of a function, but can put them also somewhere else in a function.

Setting restore point options

The function set.restore.point.options allows to set some global options for restore points, e.g. whether to use the restore point console or not. Take a look at the help file.

Nested function calls

Restore points are particularly useful when debugging nested function calls and in situations in which errors arise only under specific parameter constellations (possibly randomly drawn ones). Here is an example of a faulty function that shall draw 10 random swap.point for a given vector and print the swapped version.


# Randomly choose 10 swap points
f = function(v) {
    restore.point("f")
    for (i in 1:10) {
        rand.swap.point = sample(1:length(vec), 1)
        sw = swap.in.vector(v, rand.swap.point)
        print(sw)
    }
}

set.seed(12345)
f(v = 1:5)
##  [1] NA NA NA  5  1  2  3  4  5 NA NA
##  [1] NA NA NA NA  5  1  2  3  4  5 NA NA NA
##  [1] NA NA NA  5  1  2  3  4  5 NA NA
##  [1] NA NA NA NA  5  1  2  3  4  5 NA NA NA
## [1] 5 1 2 3 4
## [1] 2 3 4 5 1
## [1] 4 5 1 2 3
## [1] NA  5  1  2  3  4  5
##  [1] NA NA NA  5  1  2  3  4  5 NA NA
##  [1] NA NA NA NA NA  5  1  2  3  4  5 NA NA NA NA

The result looks strange. There is a mistake either in function f or in swap.in.vector or in both. It is convenient to stop the execution whenever an obviously wrong result is encountered. For this purpose, we modify f by stopping execution if the length of the result is different than the length of the original vector. We also add a restore.point with name ``f.in.loop'' inside the loop.


# Randomly choose 10 swap points
f = function(v) {
    restore.point("f")
    for (i in 1:10) {
        rand.swap.point = sample(1:length(v), 1)
        sw = swap.in.vector(v, rand.swap.point)
        print(sw)
        restore.point("f.in.loop")
        stopifnot(length(sw) == length(v))
    }
}

set.seed(12345)
f(v = 1:5)
## [1] 4 5 1 2 3
## [1] 5 1 2 3 4
## [1] 4 5 1 2 3
## [1] 5 1 2 3 4
## [1] 3 4 5 1 2
## [1] 1 2 3 4 5 1
## Error: length(sw) == length(v) is not TRUE

The error may have occurred in swap.in.vector or in f or in both. By restoring the restore point in swap.in.vector, we first have a look at the parameters of the last function call before execution has been stopped.

#swap.in.vector = function(vec,swap.ind) {
    restore.point("swap.in.vector")
## Restored: swap.ind,vec
    swap.ind
## [1] 1
    vec
## [1] 1 2 3 4 5
# vec has different values than the parameter v=1:5
# with which we have called f

We seem to call swap.in.vector, with a swap.point that is larger than the vector!. This suggests that there is an error in the function f. We restore our restore point ``f.in.loop'' and examine the local variables.

restore.point("f.in.loop")
## Restored: i,rand.swap.point,sw,v
v
## [1] 1 2 3 4 5
rand.swap.point
## [1] 1
# There must be a mistake when rand.swap.point is drawn
rand.swap.point = sample(1:length(vec), 1)
# Indeed, we use the wrong variable: vec instead of v Corrected:
rand.swap.point = sample(1:length(vec), 1)

It can be helpful to include a restore point within the for loop in order to analyze the values of the local variables before the error has been thrown.

Known Caveats and Issues

Variables that are passed by reference (e.g. environments)

By default restore.point does not make a deep copy of R variables that are passed by reference, like environments. However, one can set the option deep.copy = TRUE to make such deep copies. Although I try to make deep.copy comprehensive, e.g. searching for enviornments within lists, there probably will still remain some issues. Please let me know.

Parsing multiline expressions in restore point console

So far my code for parsing multiline expressions in the restore point console is very rough and relies on checking the text of an error message in a tryCatch statement. I am not sure, whether it works in all non-english versions of R. I would be very happy, if somebody knows a better solution to the problem described here:

http://stackoverflow.com/questions/13752933/parse-multiline-expressions-in-r-emulating-r-console