Calculator REPL Part 2: Tokenizing the Input

This is the second post in a series about building a calculator REPL in Rust. You may want to start with the first post. Today I’ll talk about how the tokenizer is built.

The tokenizer, basically, is a function that looks at a string and recognizes chunks that are meaningful to the program. In the calculator REPL I wrote, these are represented as an enumerated type, because they can be a delimiter, an operator, or a value.

In principle, we could write the tokenizer by looping over the string and checking a bunch of if statements to see which token we’re working on, but this would be very prone to bugs and hard to read. Instead, I used a crate called nom.

nom lets me write this instead:

This is a basic part of the parser. It does things like map a substring containing just the character ( to the token that represents a left paren. So far It doesn’t feel very magical. The real magic comes from nom’s ability to nest parsers:

So here we go. A single token is either a left paren, or a right paren, or an operator, or an operand. Nice declarative syntax. I love how the single_token function is basically just a list of legal tokens, and it works! Then, if we want more than one token, we can just tell nom, “hey, this string is a whitespace delimited list of substrings that should match single_token, and we get our parser for free. Very fun.

This lets me have my overall parsing function be super clean:

And hurray! We call parse, pass in our input, and it either makes a vector of tokens or, if the byte array can’t be parsed as a valid set of tokens, we return an error.

One feature that I’d like to add, which would be a bit more work I believe, is trying to tell the user what specifically went wrong with the tokens that they tried to parse. Having at least “unexpected token: foo” would be a nice future improvement.

And now we have a file that will turn user input strings into lists of tokens that mean something to the program. You can see the whole tokenizer here. Next time, we’ll learn to evaluate a list of tokens to get the result of the calculation.

Till then, happy learning!


2 thoughts on “Calculator REPL Part 2: Tokenizing the Input”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s