R for the C# Dev

I’ve been a c-sharp developer for about a year now. For much of that time, I’ve been working on the Data Science Specialization. I’ve had the experience of learning C# and then immediately turning around and learning R. Here are a few insights.

Terminology and Syntax

There are some parts of the R’s terminology and syntax that just look weird coming from c-style languages.

  1. The . is not an operator. It’s a valid character for identifiers. read.table() is the name of a function that reads tables. It is not the table() part of an object called read.
  2. They use <- for assignment sometimes. (See gotchas below.)
  3. They call enumerated types factors. For example, if I have some data, and one column is only ever allowed to have four levels (say, the four different types of treatment we’re studying), then that is a column of class factor in my dataframe.
  4. The boolean literals and null keyword are printed in all caps. (NULL, TRUE, FALSE).
  5. Simple operations operate on vectors. (A vector is like a resizing array.) If you add two vectors, the result is a new vector created by item-wise addition. Note that if the second vector is too short, R will just start re-using it.

For Loops are a Bad Sign

This is actually a similarity between C# and R, I think. In C#, if you’re using for or foreach everywhere, there’s a good chance you should be using LINQ instead, and in R if you’re using for everywhere, you should be using one of the apply() functions.

When I started R, I found the apply functions unnecessarily confusing. It’s like 17 different people implemented their own versions of Select() (or map() if JavaScript is your thing). In fact, these functions all have sort of specific uses, and learning them will help you write (and read) R code a lot.

I found one useful post on the different types of apply functions. I’ll try to provide an even shorter version here. (One frustration I’ve had with R documentation, in general, is that I’m trying to subset a list, and all the blogs I can find are trying to write generalized linear models for predicting Alzheimer’s diagnoses, and I have to dig through 50 lines of math-heavy code to find the operation that actually subsets the list.)

  1. tapply() is for applying a function to each subset of a group.
  2. lapply() is like Select(). It takes a list and a function, and returns a new list of the same length, with each new element being the corresponding old element transformed by the function.
  3. apply() is for manipulating matrices. You give it a matrix, a margin (1 for rows, 2 for columns, c(1,2) for both), and a function. I find this one least intuitive, so here’s a quick example: ** example of tapply():**
     groups <- factor(c(rep("a", 5), rep("b", 5), rep("c", 5)) # 5 each of 'a','b','c'
     values <- 1:15 #integer range
     tapply(values, groups, mean)
    #  a  b  c 
    #  3  8 13 

** example of lapply()**

myList <- list(a = 1:5, b = 6:10, c=11:15)
squares <- lapply(myList, function(x) {x * x})
squares
$a
[1]  1  4  9 16 25

$b
[1]  36  49  64  81 100

$c
[1] 121 144 169 196 225

** example of apply(): **

$ m <- matrix(1:12, 3, 4)
$ m
  [,1] [,2] [,3] [,4]
  [1,]    1    4    7   10
  [2,]    2    5    8   11
  [3,]    3    6    9   12
$ s <- apply(m, 1, sum)
$ s
  [1] 22 26 30
$ s2 <- apply(m, 2, sum)
$ s2
  [1]  6 15 24 33
$ s3 <- apply(m, c(1,2), sum)
$ s3
  [,1] [,2] [,3] [,4]
  [1,]    1    4    7   10
  [2,]    2    5    8   11
  [3,]    3    6    9   12

Gotchas

These are the bugs that I’ve most often written accidentally in R.

Leaving off the minus sign in assignment

#code code code
myExistingVariable < makeOtherVariable()

Did you spot the bug? I wanted to tell R “call makeOtherVariable() and assign its return value to the variable called myExistingVariable.” I accidentally told R, “Call makeOtherVariable(), now compute a boolean vector that is true everywhere an item in myExistingVariable is less than the corresponding value in the return from makeOtherVariable(), then discard that vector.” This operation will succeed without error (R will do implicit conversions to make the comparison possible), and worse, the only warning you’ll get will be at runtime, and only if makeOtherVariable() returns a vector whose length will not divide evenly into the length of myExistingVariable. Sadface.

Forgetting how Deep in Nested Collections I am

AKA “functions that return lists sometimes return lists of length 1”. Sometimes, I’ve taken a function that returns a list, and tried to immediately use the result elsewhere:

thoughtItWasAVector <- transFormListOfVectors(otherVector)
confusingResult <- transFormVector(thoughtItWasAVector)
#what I meant:
thoughtItWasAVector <- transFormListOfVectors(otherVector)
confusingResult <- transFormVector(thoughtItWasAVector[[1]])

No static type checking

My biggest general complaint about R is the extent to which it muddles through. Function was expecting a list but got a vector? It will probably do some sort of cast. Using < or > between things that aren’t the same type? Implicit cast. Typo a variable name? Null reference exception (or whatever R calls them) at runtime. If anyone knows how to do use strict; in R, please let me know. In the meantime, I hope this post will do people a bit of good.

Also, .NET Rocks did a recent show on learning R, and the guests pointed out something very important: R is a domain-specific language. That domain is stats. I find that as I learn more stats, R makes more sense; the language is built to do the sort of things that statisticians like to do. I do with there had been one or two professional software engineers aroung when they designed the language – at least then we’d have consistent rules for how to name functions. Nevertheless, a little bit of coursera and a little bit of practice with the apply() family will go a long way.

Till next week, happy learning.

-Will

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s