I’ve been a c-sharp developer for about a year now. For much of that time, I’ve been working on the Data Science Specialization. I’ve had the experience of learning C# and then immediately turning around and learning R. Here are a few insights.

## Terminology and Syntax

There are some parts of the R’s terminology and syntax that just look weird coming from c-style languages.

- The
`.`

is not an operator. It’s a valid character for identifiers.`read.table()`

is the name of a function that reads tables. It is not the`table()`

part of an object called`read`

. - They use
`<-`

for assignment sometimes. (See gotchas below.) - They call enumerated types
`factors`

. For example, if I have some data, and one column is only ever allowed to have four levels (say, the four different types of treatment we’re studying), then that is a column of class`factor`

in my dataframe. - The boolean literals and null keyword are printed in all caps. (
`NULL`

,`TRUE`

,`FALSE`

). - Simple operations operate on vectors. (A vector is like a resizing array.) If you add two vectors, the result is a new vector created by item-wise addition.
*Note that if the second vector is too short, R will just start re-using it.*

## For Loops are a Bad Sign

This is actually a similarity between C# and R, I think. In C#, if you’re using `for`

or `foreach`

everywhere, there’s a good chance you should be using `LINQ`

instead, and in R if you’re using `for`

everywhere, you should be using one of the `apply()`

functions.

When I started R, I found the apply functions unnecessarily confusing. It’s like 17 different people implemented their own versions of `Select()`

(or `map()`

if JavaScript is your thing). In fact, these functions all have sort of specific uses, and learning them will help you write (and read) R code a lot.

I found one useful post on the different types of apply functions. I’ll try to provide an even shorter version here. (One frustration I’ve had with R documentation, in general, is that I’m trying to subset a list, and all the blogs I can find are trying to write generalized linear models for predicting Alzheimer’s diagnoses, and I have to dig through 50 lines of math-heavy code to find the operation that actually subsets the list.)

`tapply()`

is for applying a function to each subset of a group.`lapply()`

is like`Select()`

. It takes a list and a function, and returns a new list of the same length, with each new element being the corresponding old element transformed by the function.`apply()`

is for manipulating matrices. You give it a matrix, a margin (1 for rows, 2 for columns, c(1,2) for both), and a function. I find this one least intuitive, so here’s a quick example: ** example of`tapply()`

:**`groups <- factor(c(rep("a", 5), rep("b", 5), rep("c", 5)) # 5 each of 'a','b','c' values <- 1:15 #integer range tapply(values, groups, mean) # a b c # 3 8 13`

** example of `lapply()`

**

```
myList <- list(a = 1:5, b = 6:10, c=11:15)
squares <- lapply(myList, function(x) {x * x})
squares
$a
[1] 1 4 9 16 25
$b
[1] 36 49 64 81 100
$c
[1] 121 144 169 196 225
```

** example of `apply()`

: **

```
$ m <- matrix(1:12, 3, 4)
$ m
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
$ s <- apply(m, 1, sum)
$ s
[1] 22 26 30
$ s2 <- apply(m, 2, sum)
$ s2
[1] 6 15 24 33
$ s3 <- apply(m, c(1,2), sum)
$ s3
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
```

## Gotchas

These are the bugs that I’ve most often written accidentally in R.

**Leaving off the minus sign in assignment**

```
#code code code
myExistingVariable < makeOtherVariable()
```

Did you spot the bug? I wanted to tell R “call `makeOtherVariable()`

and assign its return value to the variable called `myExistingVariable`

.” I accidentally told R, “Call `makeOtherVariable()`

, now compute a boolean vector that is true everywhere an item in `myExistingVariable`

is less than the corresponding value in the return from `makeOtherVariable()`

, then discard that vector.” This operation will succeed without error (R will do implicit conversions to make the comparison possible), and worse, the only warning you’ll get will be at runtime, and only if `makeOtherVariable()`

returns a vector whose length will not divide evenly into the length of `myExistingVariable.`

Sadface.

**Forgetting how Deep in Nested Collections I am**

AKA “functions that return lists sometimes return lists of length 1”. Sometimes, I’ve taken a function that returns a list, and tried to immediately use the result elsewhere:

```
thoughtItWasAVector <- transFormListOfVectors(otherVector)
confusingResult <- transFormVector(thoughtItWasAVector)
#what I meant:
thoughtItWasAVector <- transFormListOfVectors(otherVector)
confusingResult <- transFormVector(thoughtItWasAVector[[1]])
```

**No static type checking**

My biggest general complaint about R is the extent to which it muddles through. Function was expecting a list but got a vector? It will probably do some sort of cast. Using < or > between things that aren’t the same type? Implicit cast. Typo a variable name? Null reference exception (or whatever R calls them) at runtime. If anyone knows how to do `use strict;`

in R, please let me know. In the meantime, I hope this post will do people a bit of good.

Also, .NET Rocks did a recent show on learning R, and the guests pointed out something very important: R is a domain-specific language. That domain is stats. I find that as I learn more stats, R makes more sense; the language is built to do the sort of things that statisticians like to do. I do with there had been one or two professional software engineers aroung when they designed the language – at least then we’d have consistent rules for how to name functions. Nevertheless, a little bit of coursera and a little bit of practice with the `apply()`

family will go a long way.

Till next week, happy learning.

-Will