00:06 In our last episode, we talked about different approaches to writing
a parser. For our Ledger GUI app, we
chose to use a parser combinator library, which allows us to write parser code
that's similar to how you would write down the grammar for the parser. The
downside is that this code is initially very foreign and hard to understand. In
this episode, we'll take a deeper look at parser combinators and write the basic
elements of such a library from scratch.
00:39 The purpose of this exercise isn't necessarily for you to implement
your own parser combinator library from scratch. In many cases, it's probably
better to use one of the existing libraries. However, implementing the basic
elements of parser combinators will help you understand how parser combinators
work, so that you can use one of the existing libraries more easily when you
need to.
00:58 Furthermore, implementing some parser combinators from scratch is a
good mental workout. Doing so will push you out of your comfort zone if you've
lived in the imperative object-oriented paradigm for a long time. As mentioned
above, the code we're going to write, which comes from the functional
programming paradigm, will be very foreign at first. Don't become discouraged —
it takes a while to wrap your head around it. And once you've internalized the
basic ideas, you can apply them in many other places as well.
02:24 So let's get started! This is the code we showed last time as an
example of what parser combinator code looks like. It parses simplified
expressions containing addition and multiplication operators:
let digit = character(condition: { CharacterSet.decimalDigits.contains($0.unicodeScalar) })
let int = digit.many.map { characters in Int(String(characters))! }
let multiplication = curry({ x, y in x * (y ?? 1) }) <^> int <*> (string("*") *> int).optional
let addition = curry({ x, y in x + (y ?? 0) }) <^> multiplication <*> (string("+") *> multiplication).optional
02:40 Our goal for this episode is to implement the elements needed for
the first half of this code, i.e. we want to be able to write the integer
parser.
Defining the Parser
Type
02:49 First we have to think about what a parser actually is. A first
approximation could be a function that takes a string and returns a tuple
containing some kind of result and the remaining string. In the case of our
expression parser, the result could be just an integer. Additionally, we mark
the whole return type as optional, since the parsing might fail:
typealias Parser = (String) -> (Int, String)?
03:35 However, it turns out that for a parser, it's a bad idea to
operate on a string directly, since this is very costly from a performance
perspective. Instead, we'll use String.CharacterView
and define a type alias,
Stream
, for this:
typealias Stream = String.CharacterView
typealias Parser = (Stream) -> (Int, Stream)?
04:11 The next thing we have to improve is the result type. The result
of a parser won't always be an integer, even in our case of the expression
parser (for example, when we parse an operator symbol). So the result type needs
to be generic. We can change our definition like this:
typealias Stream = String.CharacterView
typealias Parser<A> = (Stream) -> (A, Stream)?
04:39 Lastly, instead of using a simple typealias
, we'll define a
struct, Parser
, that's generic over the result type. The advantage of this
approach is that we can write extensions on this struct later:
struct Parser<A> {
typealias Stream = String.CharacterView
let parse: (Stream) -> (A, Stream)?
}
Writing a Character Parser
05:22 With this parser struct at hand, we can write the character parser
as the first example. We start by writing a function that takes a condition,
which checks for a certain character and returns a Parser<Character>
:
func character(condition: @escaping (Character) -> Bool) -> Parser<Character> {
}
06:03 character
is a function that creates a parser that can parse a
particular character. Therefore, we know that our implementation has to return a
parser struct. We can start implementing the function's body like this:
func character(condition: @escaping (Character) -> Bool) -> Parser<Character> {
return Parser(parse: { stream in
})
}
06:35 Now we have to implement the parse
function of the parser struct
we're returning. The first task is to check whether there are any characters in
the input stream at all. If there's at least one character, we check if it
matches the condition that got passed into the character
function. We combine
both of these checks in one guard statement:
func character(condition: @escaping (Character) -> Bool) -> Parser<Character> {
return Parser(parse: { stream in
guard let char = stream.first, condition(char) else { return nil }
})
}
07:09 After the guard statement, we have to return a tuple of type
(Character, Stream)
. The first element of the tuple is simply the character
that we just checked using the condition. The second element is the remaining
input stream. We use dropFirst
to create this remainder:
func character(condition: @escaping (Character) -> Bool) -> Parser<Character> {
return Parser(parse: { stream in
guard let char = stream.first, condition(char) else {
return nil
}
return (char, stream.dropFirst())
})
}
08:06 Now we can define the digit
parser and test it with a string.
The parser expects a String.CharacterView
as input, so we have to convert the
string before passing it in:
digit.parse("123".characters)
08:28 The result is a tuple of a single digit and the remainder of the
string that hasn't yet been parsed:
("1", "23")
08:37 To simplify executing the parser and to format its result more
nicely for running it in a playground, we add a convenience run
method in an
extension to Parser
:
extension Parser {
func run(_ string: String) -> (A, String)? {
guard let (result, remainder) = parse(string.characters) else { return nil }
return (result, String(remainder))
}
10:17 Now we can execute the digit
simply, like this:
digit.run("123")
Adding the many
Combinator
10:41 Next, we can add the many
combinator so that we can parse
multiple digits at once. The many
function is structurally very similar to the
character
function. So before we write many
, let's clean up character
a
little bit by using a trailing closure to remove the unnecessary parentheses:
func character(condition: @escaping (Character) -> Bool) -> Parser<Character> {
return Parser { stream in
guard let char = stream.first, condition(char) else {
return nil
}
return (char, stream.dropFirst())
}
}
11:17 Since we want to use the many
combinator as a property (e.g.
digit.many
), we have to define it in an extension on Parser
. We can write it
as a computed property of type Parser<[A]>
. The result type of Parser
is
[A]
, because we'll execute the current parser multiple times, yielding a
series of results of type A
:
extension Parser {
var many: Parser<[A]> {
}
}
11:40 In the implementation, we return a new parser that collects all
the results of executing parse
on self
in an array. Each call to parse
gets the remainder of the previous call as input so that we don't get stuck in
an infinite loop. Once the call of parse
fails, we return a tuple with the
results and the remainder:
extension Parser {
var many: Parser<[A]> {
return Parser<[A]> { stream in
var result: [A] = []
var remainder = stream
while let (element, newRemainder) = self.parse(remainder) {
result.append(element)
remainder = newRemainder
}
return (result, remainder)
}
}
}
13:29 Now we can run digit.many
on our example string:
(["1", "2", "3"], "")
13:38 If we add some characters to the end of the example that the digit
parser is unable to parse, it'll stop at this point and we'll have a non-empty
remainder:
digit.many.run("123abc")
(["1", "2", "3"], "abc")
Writing map
on Parser
13:52 The last step to make the integer parser work is to transform the
array of digit characters into a string, and then into an integer. For this, we
have to implement a map
function that we can use, like so:
digit.many.map { characters in Int(String(characters))! }
14:16 We write map
in an extension on Parser
. map
functions all
have the same structure: they have a generic parameter for the result, T
; they
expect a transform function as argument, which converts the current value into a
value of type T
; and finally, they return the result containing the
transformed value. In our case, the result type is Parser<T>
:
extension Parser {
func map<T>(_ transform: @escaping (A) -> T) -> Parser<T> {
}
}
14:54 In the implementation, we return a new parser that executes
parse
on self
and transforms the result using the transform
function. In
case parse
fails, we simply return nil
:
extension Parser {
func map<T>(_ transform: @escaping (A) -> T) -> Parser<T> {
return Parser<T> { stream in
guard let (result, remainder) = self.parse(stream) else { return nil }
return (transform(result), remainder)
}
}
}
15:44 Using the int
parser on the same example input, "123"
, now
correctly yields an integer result:
int.run("123")
(123, "")
16:04 The next step would be to define more functions on Parser
, and
to eventually define custom operators for these functions, which allow us to
combine parsers in different ways. We'll take a look at this in the next
episode.