-> Going both ways in R <-

May 19, 2010

Andy, in response to my griping on #Rstats, asks “What’s wrong with an arrow going both ways!?!?!” In R one can perfom assignment three ways:

a = 2
a <- 2
2 -> a

I’m pretty sure all these things are equivalent. The only subtlety I’ve noticed so far is that to set default options in function definitions you have to use `=`.

This redundancy drives me absolutely nuts as an R beginner. From a human point of view code has two main jobs: to be written and to be read. Now for the first job, having three different ways to assign a value to a variable might be nice? I’m not sure how, but I guess it would be nice sometimes to write a function, then remember that you hadn’t assigned it a variable and do it at the end instead.

For the second job  though, reading R is awful. Now I’ve heard horror stories about perl, so maybe it’s just that I’m used to readable code and that, in the big scheme of things, R is nice and readable. But yesterday, ploughing through someone’s metaprogramming, tearing at my hair as to the obtuseness of everything, I was foiled for about an hour having not realised that the fsking arrow could go both ways.

I think it highlights a big flaw in R. Sometimes it’s like no “destructive” design decisions have been taken since the original work of Ihaka and Gentleman. I spend so much time messing about trying to convert a data.frame of factors into a list of “vectors” of strings or something. It seems that no-one has ever said “OK, we need a way to assign a value to a variable: let’s pick one”. Instead three have emerged. There are two different types of classes (which only seems to have any real effect on the help files). There are loads of data types that are in regular use, with somewhat bastardised names (i.e. a ‘vector’, ‘matrix’, ‘list’, ‘array’ are all different things – a vector doesn’t seem to be a subclass of matrix, for example).

Now, I’m sure that the pluralities in R could maybe be held up to be a good thing. They probably encode subtleties that I don’t understand, and that my bitching is just me exposing my naivety about the language. And I know that R is descended from LISP and S and that its user base either grew up with R or with one of these other languages. Therefore things that are natural to this background will seem alien to someone like me with a MATLAB and Python background. And there’s backwards compatibility to consider and so on.

So don’t get me wrong – I think R is an amazing piece of work, and the communities around R are incredibly devoted and responsive. It’s kind of like the “first” three episodes of Star Wars: there’s an amazing film in there somewhere, it just needs some editorial decisions to be made. Maybe a new “Hadley” language will be born whereby R is completely re-API’d into a single language that only uses data.frames…

Advertisements

4 Responses to “-> Going both ways in R <-”

  1. Byron Says:

    “Character variables passed to data.frame are converted to factor columns unless protected by I” — the data.frame documentation.

    Matrices are vectors with a dimension attribute:

    x = c(1,2,3,4)
    y = matrix(x,2,2)
    y
    attributes(y)$dim = NULL
    y

    Lists are vectors of vectors:

    x = list(c(1,2,3,4),c(5,6,7,8))
    y = matrix(x,2)
    y

  2. mikedewar Says:

    Thanks for the clarification of vectors, matrices and lists. Most helpful.

    However, I hadn’t realised that so many things were vectors:

    is.vector(“bob”)

    returns TRUE! Yet it’s a vector with one element, and that element is “bob”. Sigh. Also:

    is.vector(2)

    returns TRUE. My concept of “vector” is being really messed around with.

    The stuff about converting characters to factors in data.frames is very useful to know – and not exactly intuitive! When I’ve got 16K unique gene symbols in a data frame, having them as factors is BAD. This will save me pain. Thanks Byron!

  3. Ken Williams Says:

    I think this is one of the wounds that Python has inflicted on the world: they make people think that a rigid, Guido’s-way-or-the-highway philosophy in the language automatically makes something more readable. It does tend to let people more quickly understand a very shallow, surface-understanding of code, but you pay a price in expressive power. And that means you often can’t write things the way you really Want To write them, i.e. the way that matches your task most closely and expresses your intent most clearly. To a Perl programmer, there’s a semantic difference between “if (…) die …” and “die … if …” even if the compiler treats them the same.

    Plus, I think it’s more fun to work with a language when I think the authors of the language are having fun too.

    Regarding vectors of length 1: this really makes everything way easier and is a major reason R gets its power. Since pretty much everything in the language operates on vectors anyway, the only thing a special data type for scalars could really do is Break Stuff and create more special cases.

  4. Byron Says:

    Yes, R doesn’t actually have a scalar type, they’re just vectors of length 1. Most of the time you don’t notice due to R’s element recycling.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: