This is the second post in a series about building a calculator REPL in Rust. You may want to start with the first post. Today I’ll talk about how the tokenizer is built.
The tokenizer, basically, is a function that looks at a string and recognizes chunks that are meaningful to the program. In the calculator REPL I wrote, these are represented as an enumerated type, because they can be a delimiter, an operator, or a value.
In principle, we could write the tokenizer by looping over the string and checking a bunch of if statements to see which token we’re working on, but this would be very prone to bugs and hard to read. Instead, I used a crate called
nom lets me write this instead:
This is a basic part of the parser. It does things like map a substring containing just the character
( to the token that represents a left paren. So far It doesn’t feel very magical. The real magic comes from nom’s ability to nest parsers:
So here we go. A single token is either a left paren, or a right paren, or an operator, or an operand. Nice declarative syntax. I love how the
single_token function is basically just a list of legal tokens, and it works! Then, if we want more than one token, we can just tell nom, “hey, this string is a whitespace delimited list of substrings that should match
single_token, and we get our parser for free. Very fun.
This lets me have my overall parsing function be super clean:
And hurray! We call parse, pass in our input, and it either makes a vector of tokens or, if the byte array can’t be parsed as a valid set of tokens, we return an error.
One feature that I’d like to add, which would be a bit more work I believe, is trying to tell the user what specifically went wrong with the tokens that they tried to parse. Having at least “unexpected token: foo” would be a nice future improvement.
And now we have a file that will turn user input strings into lists of tokens that mean something to the program. You can see the whole tokenizer here. Next time, we’ll learn to evaluate a list of tokens to get the result of the calculation.
Till then, happy learning!