Rescue your Writing with Richard Lanham

Long ago, even before I was a high school teacher, I was a technical writer. I worked with some epidemiologists. Their job was to do epidemiology, and write reports. My job, basically, was to read reports and make sure that they said what the epidemiologists meant, to make sure the math and the English said the same thing. It was a pretty fun job.

During that time I got to read Revising Prose by Richard Lanham. It’s a hilarious and helpful little book on how to fix bad writing. (Let’s pause for a moment to appreciate the existing of a writing textbook that is fun to read, a phenomenon which is surprisingly, and worryingly, rare.) When I started this post, I planning a series on writing for developers, but then I realized that I would have to copy large parts of Lanham’s book. So instead, here’s a recommendation and a summary. But seriously, go check out this book.

I recommend Revising Prose to developers in particular because they often see what Lanham calls “the official style.” In case you can’t imagine it from the name: “in order to facilitate the processing of new batches by the accounts receivables department, methods and protocols, some of them manual and some automatic, will need to be put in place. One significant requirement of this project is that the accounting team be able to interpret the output of the reporting software…” Yikes! The official style is the insomnia-curing, bureaucratic nonsense that has you peaking at the last page number minutes after you start reading. It’s miserable. The good news is, it’s avoidable. Lanham’s book teaches the reader to take sentences like that terrifying sample and make it so that other people can read them. (And all those words for a sentence that should say: “We need automated reports that accountants can read.”)

When people can read what one another write, software development is easier. It goes faster. There are fewer meetings. There are fewer times when you blow deadlines. Richard Campbell, in a recent .NET Rocks episode, summarized a software team problem like this: “On time, on budget, built the wrong thing.” That is a serious risk. Do not build the wrong thing; do not let your team build the wrong thing. Instead, write clearly.

Lanham even gives us a numerical measure of progress, the ‘lard factor’:

You find the lard factor by dividing the difference between the number of words in the original and the revision by the number of words in the original.

In the example above, we cut 53 words (“in order to … the reporting software”) down to 8, for a lard factor of 45 / 53 = 85%. Now, that was a contrived example; your mileage may very. But Lanham does say to expect 50-67% reduction in word count. Think of the time and headache that will be saved!

Rather than spoil the whole book for you, I’m going to propose a list of what I call “writing smells.” (cf. code smells). Writing may smell if:

  1. It has many prepositions (of, by, from, for, about, etc.)
  2. It uses is and a lot (and was and may be and so on)
  3. All of the sentences are the same length, especially if all the sentences are long.
  4. It has lots of nouns that are secretly verbs.

Let me explain that last for a minute: A noun that is secretly a verb (the linguistic term is nomen actionis) is a noun that just means “the act of whatevering.” Speculation is the nomen actionis of speculate. Generation is the nomen actionis of generate. Fortunately, these nouns are fairly easy to spot in English. They often end with tion (creation, generation, migration, instantiation, propagation, etc.), and occasionally they’ll just end with ing (processing, duplicating, replicating.) If you start to see lots of these nouns, try turning one into a verb. Suddenly “to facilitate the creation of” is just “we created.”

Fixing these smells is a bit like, when writing code, you stumble on that perfect refactoring, and all of a sudden your methods have sensible names and get shorter.

So be on the lookout for those prose smells, and give that book a try. Till next week, happy learning!

-Will

R for the C# Dev

I’ve been a c-sharp developer for about a year now. For much of that time, I’ve been working on the Data Science Specialization. I’ve had the experience of learning C# and then immediately turning around and learning R. Here are a few insights.

Terminology and Syntax

There are some parts of the R’s terminology and syntax that just look weird coming from c-style languages.

  1. The . is not an operator. It’s a valid character for identifiers. read.table() is the name of a function that reads tables. It is not the table() part of an object called read.
  2. They use <- for assignment sometimes. (See gotchas below.)
  3. They call enumerated types factors. For example, if I have some data, and one column is only ever allowed to have four levels (say, the four different types of treatment we’re studying), then that is a column of class factor in my dataframe.
  4. The boolean literals and null keyword are printed in all caps. (NULL, TRUE, FALSE).
  5. Simple operations operate on vectors. (A vector is like a resizing array.) If you add two vectors, the result is a new vector created by item-wise addition. Note that if the second vector is too short, R will just start re-using it.

For Loops are a Bad Sign

This is actually a similarity between C# and R, I think. In C#, if you’re using for or foreach everywhere, there’s a good chance you should be using LINQ instead, and in R if you’re using for everywhere, you should be using one of the apply() functions.

When I started R, I found the apply functions unnecessarily confusing. It’s like 17 different people implemented their own versions of Select() (or map() if JavaScript is your thing). In fact, these functions all have sort of specific uses, and learning them will help you write (and read) R code a lot.

I found one useful post on the different types of apply functions. I’ll try to provide an even shorter version here. (One frustration I’ve had with R documentation, in general, is that I’m trying to subset a list, and all the blogs I can find are trying to write generalized linear models for predicting Alzheimer’s diagnoses, and I have to dig through 50 lines of math-heavy code to find the operation that actually subsets the list.)

  1. tapply() is for applying a function to each subset of a group.
  2. lapply() is like Select(). It takes a list and a function, and returns a new list of the same length, with each new element being the corresponding old element transformed by the function.
  3. apply() is for manipulating matrices. You give it a matrix, a margin (1 for rows, 2 for columns, c(1,2) for both), and a function. I find this one least intuitive, so here’s a quick example: ** example of tapply():**
     groups <- factor(c(rep("a", 5), rep("b", 5), rep("c", 5)) # 5 each of 'a','b','c'
     values <- 1:15 #integer range
     tapply(values, groups, mean)
    #  a  b  c 
    #  3  8 13 

** example of lapply()**

myList <- list(a = 1:5, b = 6:10, c=11:15)
squares <- lapply(myList, function(x) {x * x})
squares
$a
[1]  1  4  9 16 25

$b
[1]  36  49  64  81 100

$c
[1] 121 144 169 196 225

** example of apply(): **

$ m <- matrix(1:12, 3, 4)
$ m
  [,1] [,2] [,3] [,4]
  [1,]    1    4    7   10
  [2,]    2    5    8   11
  [3,]    3    6    9   12
$ s <- apply(m, 1, sum)
$ s
  [1] 22 26 30
$ s2 <- apply(m, 2, sum)
$ s2
  [1]  6 15 24 33
$ s3 <- apply(m, c(1,2), sum)
$ s3
  [,1] [,2] [,3] [,4]
  [1,]    1    4    7   10
  [2,]    2    5    8   11
  [3,]    3    6    9   12

Gotchas

These are the bugs that I’ve most often written accidentally in R.

Leaving off the minus sign in assignment

#code code code
myExistingVariable < makeOtherVariable()

Did you spot the bug? I wanted to tell R “call makeOtherVariable() and assign its return value to the variable called myExistingVariable.” I accidentally told R, “Call makeOtherVariable(), now compute a boolean vector that is true everywhere an item in myExistingVariable is less than the corresponding value in the return from makeOtherVariable(), then discard that vector.” This operation will succeed without error (R will do implicit conversions to make the comparison possible), and worse, the only warning you’ll get will be at runtime, and only if makeOtherVariable() returns a vector whose length will not divide evenly into the length of myExistingVariable. Sadface.

Forgetting how Deep in Nested Collections I am

AKA “functions that return lists sometimes return lists of length 1”. Sometimes, I’ve taken a function that returns a list, and tried to immediately use the result elsewhere:

thoughtItWasAVector <- transFormListOfVectors(otherVector)
confusingResult <- transFormVector(thoughtItWasAVector)
#what I meant:
thoughtItWasAVector <- transFormListOfVectors(otherVector)
confusingResult <- transFormVector(thoughtItWasAVector[[1]])

No static type checking

My biggest general complaint about R is the extent to which it muddles through. Function was expecting a list but got a vector? It will probably do some sort of cast. Using < or > between things that aren’t the same type? Implicit cast. Typo a variable name? Null reference exception (or whatever R calls them) at runtime. If anyone knows how to do use strict; in R, please let me know. In the meantime, I hope this post will do people a bit of good.

Also, .NET Rocks did a recent show on learning R, and the guests pointed out something very important: R is a domain-specific language. That domain is stats. I find that as I learn more stats, R makes more sense; the language is built to do the sort of things that statisticians like to do. I do with there had been one or two professional software engineers aroung when they designed the language – at least then we’d have consistent rules for how to name functions. Nevertheless, a little bit of coursera and a little bit of practice with the apply() family will go a long way.

Till next week, happy learning.

-Will

IDEs and Version Control

Welcome back! This is another post aimed at total beginners. It’s a follow up to last week’s post.

We ended last week with a simple set up so that we can make and run computer programs on our own computers. This week, we’re going to talk about two other important concepts to getting started: IDEs and version control.

Do I want an IDE?

An IDE is an integrated development environment. But what are we integrating?

An IDE integrates a toolchain. A toolchain is the series of programs that takes your code from the text files you wrote to instructions that the computer can execute. The advantage of an IDE is that the toolchain is set up for you in one step. Instead of using makefiles or something to organize your build process, you just have a button that says “run.”

Many developers use IDEs because they provide a fast and simple setup, even for languages that might have a more complicated toolcahin. On the other hand, some developers avoid IDEs because they make a lot of decisions for you. For me, the choice of whether to use an IDE depends a lot on the language and on the project. I do use an IDE at work. (I use Visual Studio at work. If you’re writing in a .NET language on Windows, you should certainly use Visual Studio; the languages and the tools are built together, and really complement each other well.)

If you’re a beginner, and there is a good, free IDE for your language, I recommend two things: First, install that IDE, because it will lower your barrier to getting started. Second, learn what the IDE is doing so that you’ll be able to go without it or troubleshoot it later.

Version Control All the Things

In this section of the tutorial, you should first install Git on your computer, if you don’t have it already. I also recommend you go through the try Git tutorial, and create an account on GitHub. For the rest of this post, I’m going to assume you’ve done those things. If they give you any trouble, leave a comment and I’ll update this post with further instructions here.

And now, a sad story about not using source control:

One morning I was sitting at a coffee shop before work. I was running a little late, but I was determined to finish a coding exercise from a Coursera course before work that day. I didn’t have an editor I was in love with for that language (R, in this case), so I was jumping back and forth between Notepad++ and Visual Studio Code, depending on what I was trying to do. (Mixing editors mid-flight was a mistake.) I got something finally right, pressed CTRL+S, pressed ALT+TAB, and there was this dialog box. “File has changed on disk. Reload?” and I made the wrong choice. Boom. About two hours of code accidentally deleted because I was in a hurry and using two text editors.

Fortunately, the story is not that sad. I lost 2 hours work on a personal project. It’s not like I lost 2 weeks of work on my boss’s projects. Still, I was frustrated, and now even more behind.

So here’s the moral of my short sad story: always use source control. Always. Always. Somewhere on his blog, Joel Spolsky wrote that all non-trivial projects need source control. I’m going to add that all trivial projects need source control too.

Typing ‘git init’ in your new directory is free, gives you good practice for your future job, and gives you the ability to track different changes and versions. Plus, if you’re going to mess up source control, you might as well get it out of your system while it’s your own homework and not your company’s source code.

How I Start a New Project:

  1. Make a new directory for it, with a name I can remember. (I keep all my projects at D:\Projects\<projectname> because then I don’t need to look for them.)
  2. If I’m using an IDE, tell the IDE to create a new project there; otherwise, start writing a source file and save it there.
  3. Create a new file called “.gitignore” in that directory.
  4. I get a premade “.gitignore” template and paste it into my new .gitignore file. If I’m using Visual Studio, for example, I’ll use a .gitignore template for Visual Studio.
  5. I type ‘git init’. This creates a new repository.
  6. Every time I make a change I like, I commit it at least to the local repository.
  7. If this project is even slightly longlived or complex, I’ll put it on GitHub. (Note: free git repositories are public, so make sure there aren’t things like passwords or API keys in your public git repo.)
  8. I start actually writing lots of code.

Now that you can track changes to your source code, and can get it to execute locally, and can back it up to a remote source code repository, you’re in business.

In a future post (soon, I think) I’ll talk about choosing an easy starter project, since that’s something a lot of people ask about at programming meetups.

Till next week, happy learning!

-Will

Your First Bite of the Elephant

I hope to keep this blog useful to both more experienced programmers, who might find the topics interesting or might discover little nuggets of knowledge, but don’t need me to tell them how to get to Hello Word, and keep it useful to total beginners, who were the original intended audience. This post is for total beginners.

So, dear total beginners, welcome! I’m glad you’re here.

This post is about taking your first bite of the elephant. That is, it is about going from never having written code to having written code. So, here goes. My first step-by-step lesson: Continue reading “Your First Bite of the Elephant”

Rubrics for Code?

Teachers make rubrics fairly often. A rubric is a tool for attaching a numerical score to a something of subjective quality, like an essay. Rubrics are important because they keep you from passing students just because you like them, or failing students just because you were hungry and tired when you read their essay.

Good rubrics are a series of objective questions. “Does the essay have a works cited page? Is the page correctly formatted? Did the essay have a coherent thesis? Did it cite evidence?” Rubrics have their failings, of course. In particular, they’re weak on the high end of the spectrum; they tend to put everything in a few buckets: bad / ok / good is about the most detailed sorting you can expect from a rubric. Rubrics are also terrible at distinguishing between two pieces of really excellent writing. In short, they allow a teacher to answer the question, “is this essay good enough to meet the requirements for this class” quickly and fairly, but they don’t tell you whether you like Hemingway or Faulkner more. 

How could rubrics be applied to software development? I can think of two interesting use cases: Code reviews and MVPs. During a code review, we might want to answer a simple question: is this checkin good enough to let into the codebase? And examining an MVP, we might be answering the simple question, is this product good enough to share?

Rubric for a Code Review

For rubrics to be easy to apply, they should be a series of yes/no or bad/ok/good questions that can be answered pretty quickly. Here are some examples of what I think might make good code review rubric items:

  1. How are the names of things? Bad / ok / good
  2. Are there any really obvious inefficiencies? e.g., does the method do the same computational work twice, or does it use obviously wrong data structures?
  3. Does it follow the style guide? That is, does it have the same casing, naming conventions, indenting structure, etc., as the rest of the code?
  4. Is there any reasonable execution path that could throw a boneheaded exception? (e.g., in C#, can you get a NullReferenceException or an IndexOutOfRangeException?)
  5. Does it do what it was supposed to do?

Rubric for a Minimum Viable Product (MVP)

Basically, we’re driving at the question: Is this a thing people can use and would want? Did we build enough software that you can use this without knowing how to write code?

  1. Does it run without erroring out the vast majority of the time?
  2. Is it reasonably fast?
  3. Can people who haven’t used it before quickly figure out how to do obvious tasks?

The rest of the rubric for a minimum viable product is going to vary quite a bit between the different projects that you might work. The rest of the rubric might be a list of cool features that are or are not present. As long as it’s a short list of easy-to-answer, objective questions, it can probably be used as a rubric.

Why Rubrics?

Rubrics are really good at making a complex, subjective assessment come down to a handful of numbers and yes/no questions that most people can agree on. If you get thirty people in a room and thirty essays about the Stamp Act, and just ask everyone “which essays are good,” you will get a bunch of arguing. If you ask them, “which essays have a works cited page,” you will get a reasonable answer in a reasonable amount of time. Rubrics limit the subjectivity and scope of a decision.

Here are a few expected benefits of using a rubric:

  1. The review meeting has a definite end state: when all the rubric questions are answered, you know whether you’re done with the project or not, or whether to accept the code or not.
  2. People’s feelings get less hurt. Telling someone their code is terrible can be very discouraging. Telling them their code doesn’t have camelCasing or might throw an error in some situation is demonstrably true. It’s a good way of making sure the comments are about the code and not about the developer.
  3. It makes things fair between instances. Without rubrics, personal bias, whether conscious or not, can affect people’s performance review, and that’s not fair. A rubric-based score varies much less from personal factors.

Rubrics are really a way of making a subjective decision into two objective steps: Which criteria constitute success? and Which criteria does this object satisfy? I have often seen meetings drag on because people circle back. They start to answer a question, decide they don’t like the implication of an earlier decision, so they start second-guessing something everyone agreed to ten minutes before. The key to using the rubric is that it can’t be revised while it’s being applied. Write a rubric, grade the paper (review the code, assess the project, whatever). Then, after the assessment is over, look at the rubric? Did the rubric penalize things that were good or permit things that were bad? Revise it for next time. But revising a rubric, or going in without a rubric, when you’re trying to assess something is a good recipe for having a very long meeting.

I have seen code reviews and performance reviews and whatnot that have some objective criteria, but I think formalizing these criteria as a rubric is a good idea. I think many qualitative assessments can be accomplished by first creating a set of objective questions, whose answers indicate quality, and then applying them. Knowing that that’s the process one is trying to follow will make it easier to follow. 

Have you seen rubrics or rubric-like documents at different kinds of reviews? Are there any glaring omissions in my list? Please leave a comment below.

Till next week, happy learning,

-Will