Profiling in R

May 13, 2010

Giving a quick talk about vectorising code in R next month. Need to learn about profiling in R. This is what I have discovered in the last hour or so (when I should have been working):

Rprof()

This is the basic profiler in R. One turns it on:
Rprof("profile.out")
One runs some code … and one turns it off using, as always, R’s intuitive API
Rprof(NULL)
To see the results, you can use `summaryRprof(“profile.out”)` which parses the file you made using Rprof and presents it in a nice clear format, which looks something like:
> summaryRprof("method1.out")
$by.self
self.time self.pct total.time total.pct
"strsplit"          4.08     94.0       4.10      94.5
"+"                 0.12      2.8       0.12       2.8
"is.character"      0.12      2.8       0.12       2.8
"as.logical"        0.02      0.5       0.02       0.5
$by.total
total.time total.pct self.time self.pct
"strsplit"           4.10      94.5      4.08     94.0
"+"                  0.12       2.8      0.12      2.8
"is.character"       0.12       2.8      0.12      2.8
"as.logical"         0.02       0.5      0.02      0.5
There’s more detail here about `Rprof()` and `summaryRprof()`.

proftools

This package presents an alternative to summaryRprof, using graphviz to generate teh pretteh. Not sure how massively useful it would be in practice, unless your code got pretty complex! I’d be interested to know how complicated people’s functions actually got when using R.
To make the graph, make sure you have graph and Rgraphviz properly installed, then bash out
plotProfileCallGraph("profile.out")

profr

I kind of want everything Hadley Wickham does to be awesome, so when I see he’s made a profiling tool, I give it a lot more attention than it necessarily deserves. For example, you won’t find me hunting for documentation through anyone else’s github account. Nevertheless, this is what I find myself doing for `profr` (which in my head I will be pronouncing “proffer”). Note: it does turn out that I could have simply typed ?profr in my R terminal (does this always work)?
To use, simply run
out <- profr(code_to_profile)
then to inspect it use the normal `head`, `tail` and `summary`, noting that `out` is (pretty much) a `data.frame`:
> head(out)
f level time start  end  leaf source
8      example     1 0.44  0.00 0.44 FALSE  utils
9  <Anonymous>     2 0.04  0.00 0.04 FALSE   <NA>
10     library     2 0.02  0.04 0.06 FALSE   base
11      source     2 0.38  0.06 0.44 FALSE   base
12  prepare_Rd     3 0.04  0.00 0.04 FALSE   <NA>
13        %in%     3 0.02  0.04 0.06 FALSE   base
> summary(out)
f          level             time          start
match        : 7   Min.   : 1.000   Min.   :0.02   Min.   :0.0000
%in%         : 6   1st Qu.: 5.000   1st Qu.:0.02   1st Qu.:0.1200
is.factor    : 6   Median : 7.000   Median :0.02   Median :0.1600
eval.with.vis: 6   Mean   : 8.113   Mean   :0.04   Mean   :0.1685
inherits     : 5   3rd Qu.:10.500   3rd Qu.:0.02   3rd Qu.:0.2200
<Anonymous>  : 3   Max.   :19.000   Max.   :0.44   Max.   :0.4200
(Other)      :82
end            leaf            source
Min.   :0.0200   Mode :logical   Length:115
1st Qu.:0.1400   FALSE:99        Class :character
Median :0.2000   TRUE :16        Mode  :character
Mean   :0.2085   NA's :0
3rd Qu.:0.2500
Max.   :0.4400

Here I should say that `code_to_profile` was the standard `example(glm)`. One can also use `plot()` or `ggplot()` to plot the ‘call tree’. I’m not quite sure what a ‘call tree’ is, and to be honest the (adimittedly pretty) graph that `ggplot(out)` produces doesn’t enlighten me a great deal. That’s OK though – all I really wanted was the time spent.

A quick point about Hadley’s code, that I think highlights something wrong with R in general (and right with Python) is that in `profr`, from a user perspective I don’t really have to learn anything new to make it work. I apply a function (handily called `profr`) to some code and I get back a `data.frame` on which I can use very standard tools in order to investigate. I don’t have to write to a file (or generate a temp file to write to or anything), I don’t have to learn to stop and start hidden things, or that to stop profiling I pass `NULL` to a function that started profiling. I don’t have to use a whole new tool to parse what is basically a table. And to plot I can just use `plot` rather than having to dissapear down the (admittedly fun) hole that is graphviz. It feels like every new thing I come across in R means learning a whole API and set of concepts and object structure. (Admittedly ggplot is the total opposite of this, but hey I guess that was the point)

Advertisements