Building Parser Combinators (Part 1)

Ledger Mac App: Building Parser Combinators (Part 1)

In this episode16:22

This episode is freely available thanks to the support of our subscribers

Subscribers get exclusive access to new and all previous subscriber-only episodes, video downloads, and 30% discount for team members. Become a Subscriber →

Join us in the functional programming gym to stretch your object-oriented comfort zone while we lay the groundwork for a parser combinator library.

00:06 In our last episode, we talked about different approaches to writing a parser. For our Ledger GUI app, we chose to use a parser combinator library, which allows us to write parser code that's similar to how you would write down the grammar for the parser. The downside is that this code is initially very foreign and hard to understand. In this episode, we'll take a deeper look at parser combinators and write the basic elements of such a library from scratch.

00:39 The purpose of this exercise isn't necessarily for you to implement your own parser combinator library from scratch. In many cases, it's probably better to use one of the existing libraries. However, implementing the basic elements of parser combinators will help you understand how parser combinators work, so that you can use one of the existing libraries more easily when you need to.

00:58 Furthermore, implementing some parser combinators from scratch is a good mental workout. Doing so will push you out of your comfort zone if you've lived in the imperative object-oriented paradigm for a long time. As mentioned above, the code we're going to write, which comes from the functional programming paradigm, will be very foreign at first. Don't become discouraged — it takes a while to wrap your head around it. And once you've internalized the basic ideas, you can apply them in many other places as well.

02:24 So let's get started! This is the code we showed last time as an example of what parser combinator code looks like. It parses simplified expressions containing addition and multiplication operators:

let digit = character(condition: { CharacterSet.decimalDigits.contains($0.unicodeScalar) })
let int = digit.many.map { characters in Int(String(characters))! }

let multiplication = curry({ x, y in x * (y ?? 1) }) <^> int <*> (string("*") *> int).optional
let addition = curry({ x, y in x + (y ?? 0) }) <^> multiplication <*> (string("+") *> multiplication).optional

02:40 Our goal for this episode is to implement the elements needed for the first half of this code, i.e. we want to be able to write the integer parser.

Defining the `Parser` Type

02:49 First we have to think about what a parser actually is. A first approximation could be a function that takes a string and returns a tuple containing some kind of result and the remaining string. In the case of our expression parser, the result could be just an integer. Additionally, we mark the whole return type as optional, since the parsing might fail:

typealias Parser = (String) -> (Int, String)?

03:35 However, it turns out that for a parser, it's a bad idea to operate on a string directly, since this is very costly from a performance perspective. Instead, we'll use String.CharacterView and define a type alias, Stream, for this:

typealias Stream = String.CharacterView
typealias Parser = (Stream) -> (Int, Stream)?

04:11 The next thing we have to improve is the result type. The result of a parser won't always be an integer, even in our case of the expression parser (for example, when we parse an operator symbol). So the result type needs to be generic. We can change our definition like this:

typealias Stream = String.CharacterView
typealias Parser<A> = (Stream) -> (A, Stream)?

04:39 Lastly, instead of using a simple typealias, we'll define a struct, Parser, that's generic over the result type. The advantage of this approach is that we can write extensions on this struct later:

struct Parser<A> {
    typealias Stream = String.CharacterView
    let parse: (Stream) -> (A, Stream)?
}

Writing a Character Parser

05:22 With this parser struct at hand, we can write the character parser as the first example. We start by writing a function that takes a condition, which checks for a certain character and returns a Parser<Character>:

func character(condition: @escaping (Character) -> Bool) -> Parser<Character> {
	// ...
}

06:03 character is a function that creates a parser that can parse a particular character. Therefore, we know that our implementation has to return a parser struct. We can start implementing the function's body like this:

func character(condition: @escaping (Character) -> Bool) -> Parser<Character> {
    return Parser(parse: { stream in
		// ...
    })
}

06:35 Now we have to implement the parse function of the parser struct we're returning. The first task is to check whether there are any characters in the input stream at all. If there's at least one character, we check if it matches the condition that got passed into the character function. We combine both of these checks in one guard statement:

func character(condition: @escaping (Character) -> Bool) -> Parser<Character> {
    return Parser(parse: { stream in
        guard let char = stream.first, condition(char) else { return nil }
		// ...
    })
}

07:09 After the guard statement, we have to return a tuple of type (Character, Stream). The first element of the tuple is simply the character that we just checked using the condition. The second element is the remaining input stream. We use dropFirst to create this remainder:

func character(condition: @escaping (Character) -> Bool) -> Parser<Character> {
    return Parser(parse: { stream in
        guard let char = stream.first, condition(char) else {
            return nil
        }
        return (char, stream.dropFirst())
    })
}

08:06 Now we can define the digit parser and test it with a string. The parser expects a String.CharacterView as input, so we have to convert the string before passing it in:

digit.parse("123".characters)

08:28 The result is a tuple of a single digit and the remainder of the string that hasn't yet been parsed:

("1", "23")

08:37 To simplify executing the parser and to format its result more nicely for running it in a playground, we add a convenience run method in an extension to Parser:

extension Parser {
    func run(_ string: String) -> (A, String)? {
        guard let (result, remainder) = parse(string.characters) else { return nil }
        return (result, String(remainder))
    }

10:17 Now we can execute the digit simply, like this:

digit.run("123")

Adding the `many` Combinator

10:41 Next, we can add the many combinator so that we can parse multiple digits at once. The many function is structurally very similar to the character function. So before we write many, let's clean up character a little bit by using a trailing closure to remove the unnecessary parentheses:

func character(condition: @escaping (Character) -> Bool) -> Parser<Character> {
    return Parser { stream in
        guard let char = stream.first, condition(char) else {
            return nil
        }
        return (char, stream.dropFirst())
    }
}

11:17 Since we want to use the many combinator as a property (e.g. digit.many), we have to define it in an extension on Parser. We can write it as a computed property of type Parser<[A]>. The result type of Parser is [A], because we'll execute the current parser multiple times, yielding a series of results of type A:

extension Parser {
    var many: Parser<[A]> {
		// ...
    }
}

11:40 In the implementation, we return a new parser that collects all the results of executing parse on self in an array. Each call to parse gets the remainder of the previous call as input so that we don't get stuck in an infinite loop. Once the call of parse fails, we return a tuple with the results and the remainder:

extension Parser {
    var many: Parser<[A]> {
        return Parser<[A]> { stream in
            var result: [A] = []
            var remainder = stream
            while let (element, newRemainder) = self.parse(remainder) {
                result.append(element)
                remainder = newRemainder
            }
            return (result, remainder)
        }
    }
}

13:29 Now we can run digit.many on our example string:

(["1", "2", "3"], "")

13:38 If we add some characters to the end of the example that the digit parser is unable to parse, it'll stop at this point and we'll have a non-empty remainder:

digit.many.run("123abc")

(["1", "2", "3"], "abc")

Writing `map` on Parser

13:52 The last step to make the integer parser work is to transform the array of digit characters into a string, and then into an integer. For this, we have to implement a map function that we can use, like so:

digit.many.map { characters in Int(String(characters))! }

14:16 We write map in an extension on Parser. map functions all have the same structure: they have a generic parameter for the result, T; they expect a transform function as argument, which converts the current value into a value of type T; and finally, they return the result containing the transformed value. In our case, the result type is Parser<T>:

extension Parser {
    func map<T>(_ transform: @escaping (A) -> T) -> Parser<T> {
		// ...
    }
}

14:54 In the implementation, we return a new parser that executes parse on self and transforms the result using the transform function. In case parse fails, we simply return nil:

extension Parser {
    func map<T>(_ transform: @escaping (A) -> T) -> Parser<T> {
        return Parser<T> { stream in
            guard let (result, remainder) = self.parse(stream) else { return nil }
            return (transform(result), remainder)
        }
    }
}

15:44 Using the int parser on the same example input, "123", now correctly yields an integer result:

int.run("123")

(123, "")

16:04 The next step would be to define more functions on Parser, and to eventually define custom operators for these functions, which allow us to combine parsers in different ways. We'll take a look at this in the next episode.