Counting Fruit in Rust, Part 2

This is the third part of a series on beginning rust. You may wish to read part 1 and part 2 first.

I want to be able to visit all the nodes of a tree in Rust. Last time, I showed the implementation of the tree itself, and code that would generate a random tree. This time, I’m going to show and describe code for walking around a tree and counting which nodes have a flag set. The code we’ll talk about this episode is here:

There are two functions here: count_fruit and recursive_count_fruit. The reason that there are two functions is because the scope that own the root of the tree doesn’t have it in a box, but every subsequent node will be in a box. recursive_count_fruit is expecting an Option<Box<Node>>, but the root will just be Node, so I made an intermediate helper function that wraps the node and passes it to the recursive function. Note that line 20 doesn’t contain the return keyword or end with a semicolon. Rust will implicitly return the value of the last expression of a functions body if the types match and there’s no semicolon. (If there were a semicolon, the type of the last line would be (), because the semicolon makes an expression into a statement. If I added a semicolon, I would also need to add return.)

The real work in this file happens in recursive_count_fruit. The whole body of the function is a match statement. Because an Option<T> is an enum with the choices Some(T) and None, I have to match both of them. In the above code, if I comment out line 15 (None => 0), compilation will fail with the following error:

error[E0004]: non-exhaustive patterns: `None` not covered
 --> src/main.rs:7:11
  |
7 |     match node {
  |           ^^^^ pattern `None` not covered

As usual in Rust, this is a safety feature. If I’m matching on an enum, the compiler requires some instruction for what to do with every case of the enum. (You can add _ => do_default_thing() as a way of covering a default case.)

The None => 0 is the breaking case for our recursive function. It means, “If we’re at an empty subtree, return zero and stop trying to recurse.” If the subtree is not empty, then we need to need to check whether the current node has fruit, and then count the fruit in each subtree of the current node.

Note line 9: Rust’s if statements are really expressions, and the value of the expression is the value of the last line within the if or the else that gets evaluated. This can lead to some surprising errors when compiling Rust: You can get a type mismatch error from an if statement. If you have a Rust if statement raise a type mismatch error when it’s compiled, check whether you use the return value. If you don’t, both branches of the if/else need to return (). If you use the return value, make sure both branches return the same type.

Back to recursive_count_fruit. After the function has decided whether we need to add 1 for the current node, we dereference the current node. If we remove line 10 in the snippet above (and rename things appropriately), we get a surprising error:

error[E0382]: use of moved value: `val`
  --> src/main.rs:13:40
   |
12 |(our_fruit + recursive_count_fruit(val.left_child) +
   |                                   -------------- value moved here
13 |      recursive_count_fruit(val.right_child))
   |                            ^^^^^^^^^^^^^^^ value used here after move
   |
   = note: move occurs because `val.left_child` has type `std::option::Option<std::boxed::Box<node::Node>>`, which does no
t implement the `Copy` trait

What? Why does accessing a member of val move val? I would understand if it moved left_child, but why does it move val?

It turns out that taking ownership of a member a heap-allocated struct counts as “moving” the struct, because we moved a member of the struct, so now references to the struct will be invalid – that member is fair game to free or mutate, so the struct has moved. Also, the struct could have custom dereference behavior, and that behavior might need to check part of the struct that has already been moved, so the second access to a member of the struct on the heap is illegal, because the dereference code might fail, because the compiler no longer knows that the struct is in a consistent state.

In order to get around this, we dereference the box, which explicitly moves the entire contained struct into our stack frame local variable. (Please note: This “move” is a useful mental model for who owns the memory, but the compiled executable may not actually move the bytes in memory; the compiler considers the value to have “moved” into our function, but for performance reasons at runtime it may stay where it is.) Now, we can pull it apart and the compiler will know what’s happening to it, because the current stack frame will own it.

One import thing to understand is that the reason we need to dereference the entire box is because the contents of the box are a struct that cannot be copied. If the box contained a struct whose members could be copied, say, a Point whose members, x and y, were just floats, then I could just follow a reference to the float, get the float into the stack frame by copying it, and not have to move the struct or the box.

I’ve made this Rust sample to illustrate the difference:

https://play.rust-lang.org/?gist=b16ab5ab88916ebb102ebdf1a08f058b&version=stable&backtrace=0

In case the rust playground link stops working, I’ve also made a Gist.

It’s interesting to see how rust deals with the safety of a recursive structure. Please note that Box<T> is very special in its dereference behavior. I was really only using it in this case because it is the accepted way of making a recusrive struct in Rust. If we didn’t need a tree, we would have written this code differently. But that is another project for another day.

I would also specifically like to thank kimundi for patiently answering all my questions on the Rust beginners IRC channel.

Till next time, happy learning!

-Will

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s