Swift Talk # 203

Building a Template Language: Parsing

This episode is freely available thanks to the support of our subscribers

Subscribers get exclusive access to new and all previous subscriber-only episodes, video downloads, and 30% discount for team members. Become a Subscriber

We start building an HTML template language, implementing the parser in a test-driven way.

00:06 In this episode, we start working on a new project. We recently needed an HTML template language for something we were working on, and we thought it might be interesting to build this language together.

00:45 Building a custom template language is usually a bad idea because many ready-to-use solutions already exist, and it takes a lot of work to get the whole process right: from parsing input strings and evaluating the results, to making sure useful error messages are generated whenever anything goes wrong.

01:25 But sometimes you just can't avoid it. In our case, we want to ship a simple template language along with an app because we don't want to ask our users to know Swift, and because we want control over the error messaging.

01:49 More specifically, we want to use variables in an HTML template, and depending on where a variable is used, its value needs to be escaped differently: inside body text, we need to escape ampersands and angle brackets, but inside an attribute, we only need to escape quote characters. Because of these specific requirements, we can't use existing template languages like Stencil or Mustache.

Input Sample

02:46 To start things off, let's look at a sample of what we want to parse at the end of this project. In a test function — which we prefix with an underscore because we won't be able to make it pass for a while — we write an HTML string that includes statements such as variables, loops, and conditions:

final class ParserTests: XCTestCase {
    func _testSyntax() {
        let input = """
        <head><title>{ title }</title></head>
        <body>
          <ul>
            { for post in posts }
              { if post.published }
                <li>{ post.title }</li>
              { end }
            { end }
          </ul>
        </body>
        """
    }
}

05:12 In a later step, we'll also cover the evaluation logic that's needed to support dynamic values for attributes, such as the following URL that's used in an a tag:

<li><a href={post.url}>{ post.title }</a></li>

Parsing Variables

05:55 The above sample requires a lot of features, but we won't build them all at once. As a first step, we want to parse a single variable:

final class ParserTests: XCTestCase {
    
    func testVariable() throws {
        let input = "{ foo }"
        
    }
    
    // ...
}

06:49 We define an enum for the various types of expressions that we parse. This enum will have more cases later, but for now, its only case is a variable expression:

enum Expression {
    case variable(name: String)
}

07:07 In an extension of String, we write a parsing method that returns an expression:

extension String {
    func parse() throws -> Expression {
        
    }
}

07:31 In order to avoid expensive string copying while we're parsing, we will actually work with the Substring type, which is basically a view on the base string defined by a start index and an end index. Because of this representation, removing a character from the beginning of the substring comes down to mutating the start index — which is a very lightweight operation:

extension String {
    func parse() throws -> Expression {
        var remainder = self[...]
        return try remainder.parse()
    }
}

extension Substring {
    mutating func parse() throws -> Expression {
        
    }
}

We'll implement proper error throwing in a later phase, but for now, we'll just call fatalError wherever an error should be thrown.

08:50 In order to make our first test pass, we check that the string starts with an opening curly brace, and if it does, we remove the first character and continue parsing the rest of the string:

extension Substring {
    mutating func parse() throws -> Expression {
        guard let f = first else { fatalError("TODO") }
        if f == "{" {
            
        } else {
            fatalError("Unexpected token")
        }
    }
}

09:46 After the curly brace, we want to skip over any whitespace that follows, and then we want to parse an identifier:

extension Substring {
    mutating func parse() throws -> Expression {
        guard let f = first else { fatalError("TODO") }
        if f == "{" {
            removeFirst()
            skipWS()
            let name = try parseIdentifier()
            
        } else {
            fatalError("Unexpected token")
        }
    }
}

10:19 We also need to make sure that the variable is followed by a closing curly brace, possibly after more whitespace characters. We're already seeing a pattern of inspecting the beginning of the string and removing it if it's a match, so what if we could use a helper like this:

extension Substring {
    mutating func parse() throws -> Expression {
        if remove(prefix: "{") {
            skipWS()
            let name = try parseIdentifier()
            skipWS()
            guard remove(prefix: "}") else {
                fatalError()
            }
            return .variable(name: name)
        } else {
            fatalError("Unexpected token")
        }
    }
}

12:33 That looks much better, so let's go ahead and write the helper method that checks whether or not the substring starts with the given prefix, and if it does, removes the prefix:

extension Substring {
    mutating func remove(prefix: String) -> Bool {
        guard hasPrefix(prefix) else { return false }
        removeFirst(prefix.count)
        return true
    }
    // ...
}

14:50 We also need the helper that removes characters off the string's start as long as they're whitespace characters:

extension Substring {
    // ...
    mutating func skipWS() {
        while first?.isWhitespace == true {
            removeFirst()
        }
    }
    // ...
}

15:46 The last missing piece is the parseIdentifier method, which returns a string of letter characters:

extension Substring {
    // ...
    mutating func parseIdentifier() throws -> String {
        mutating func parseTagName() throws -> String {
        let result = ""
        while first?.isIdentifier == true {
            result.append(removeFirst())
        }
        guard !result.isEmpty else { fatalError() }
        return String(result)
    }
    }
}

extension Character {
    var isIdentifier: Bool {
        isLetter
    }
}

By writing a separate isIdentifier property, we can later change our definition of an identifier without having to change the parseIdentifier method. Using this property also makes it easier to read our parsing method.

17:37 The parse method is now ready to be used in our test:

final class ParserTests: XCTestCase {
    
    func testVariable() throws {
        let input = "{ foo }"
        XCTAssertEqual(try input.parse(), .variable(name: "foo"))
    }
    
    // ...
}

18:02 In order to test, the String.parse function and the Expression enum need to be public. It will also be useful to conform Expression to Hashable:

public enum Expression: Hashable {
    case variable(name: String)
}

extension String {
    public func parse() throws -> Expression {
        var remainder = self[...]
        return try remainder.parse()
    }
}

18:43 We run the test and see that it passes. We can also add variants with different chunks of whitespace, which should all be parsed with the same result:

final class ParserTests: XCTestCase {
    
    func testVariable() throws {
        for input in ["{ foo }", "{foo}"] {
            XCTAssertEqual(try input.parse(), .variable(name: "foo"))
        }
    }

    // TODO: test that identifier is not an empty string
    // ...
}

19:24 We add a note that we should still test that the parsing fails if the input string doesn't contain an identifier. We wrote our parser that way but — because of the fatal errors — we can't yet verify that it works correctly.

Parsing Tags

19:57 Next, let's work on parsing tags:

final class ParserTests: XCTestCase {
    // ...

    func testTag() throws {
        let input = "<p></p>"
        XCTAssertEqual(try input.parse(), .tag(name: "p"))
    }

    // ...
}

20:36 We add an Expression.tag case with associated values to hold the tag's name, its attributes, and its body. Because the attributes and body values are themselves Expressions, we need to make the enum recursive by marking it as indirect:

public indirect enum Expression: Hashable {
    case variable(name: String)
    case tag(name: String, attributes: [String:Expression] = [:], body: [Expression] = [])
}

21:28 Now we can extend the parse method to look for an opening angle bracket:

extension Substring {
    // ...
    mutating func parse() throws -> Expression {
        if remove(prefix: "{") {
            skipWS()
            let name = try parseIdentifier()
            skipWS()
            guard remove(prefix: "}") else {
                fatalError()
            }
            return .variable(name: name)
        } else if remove(prefix: "<") {
            
        } else {
            fatalError("Unexpected token")
        }
    }

22:11 In order to parse the tag name, we copy the identifier parsing method:

extension Substring {
    // ...
    mutating func parse() throws -> Expression {
        if remove(prefix: "{") {
            // ...
        } else if remove(prefix: "<") {
            let name = try parseTagName()
            
        } else {
            fatalError("Unexpected token")
        }
    }
    
    mutating func parseTagName() throws -> String {
        let result = ""
        while first?.isTagName == true {
            result.append(removeFirst())
        }
        guard !result.isEmpty else { fatalError() }
        return String(result)
    }

    mutating func parseIdentifier() throws -> String {
        // ...
    }
}

extension Character {
    var isIdentifier: Bool {
        isLetter
    }

    var isTagName: Bool {
        isLetter
    }
}

23:04 After the tag name, we expect to see a closing angle bracket, followed by a closing tag. Finally, we can return the parsed tag:

extension Substring {
    // ...
    mutating func parse() throws -> Expression {
        if remove(prefix: "{") {
            // ...
        } else if remove(prefix: "<") {
            let name = try parseTagName()
            guard remove(prefix: ">") else { fatalError() }
            let closingTag = "\(name)>"
            guard remove(prefix: closingTag) else { fatalError() }
            return .tag(name: name)
        } else {
            fatalError("Unexpected token")
        }
    }
    // ...
}

24:32 We can refactor one piece of this in order to remove some duplication from the parsing of tag names and identifiers. This is done by writing a method that removes leading characters as long as a given condition is met.

We can write this method in such a way that it works not only for substrings, but for any type of collection, by incrementing the collection's start index for each leading element that satisfies the given condition:

extension Substring {
    mutating func remove(while cond: (Element) -> Bool) -> SubSequence {
        var current = startIndex
        while current < endIndex, cond(self[current]) {
            formIndex(after: &current)
        }
        let result = self[startIndex..<current]
        self = self[current...]
        return result
    }
}

27:10 Now we can update parseTagName and parseIdentifier to make use of this method:

extension Substring {
    // ...
    mutating func parseTagName() throws -> String {
        let result = remove(while: { $0.isTagName })
        guard !result.isEmpty else { fatalError() }
        return String(result)
    }

    mutating func parseIdentifier() throws -> String {
        let result = remove(while: { $0.isIdentifier })
        guard !result.isEmpty else { fatalError() }
        return String(result)
    }
}

Nested Tags

27:59 Next, we want to be able to parse a tag with a body so that we can have nested expressions:

final class ParserTests: XCTestCase {
    // ...
    func testTagBody() throws {
        let input = "<p><span>{ foo }</span></p>"
        XCTAssertEqual(try input.parse(), .tag(name: "p", body: [
            .tag(name: "span", body: [
                .variable(name: "foo")
            ])
        ]))
    }
    // ...
}

29:26 In the parse method, we now need to try parsing an expression in between the opening and closing tags. We can do this by recursively calling parse inside a while loop that is repeated until we encounter the closing tag:

extension Substring {
    // ...
    mutating func parse() throws -> Expression {
        if remove(prefix: "{") {
            skipWS()
            let name = try parseIdentifier()
            skipWS()
            guard remove(prefix: "}") else {
                fatalError()
            }
            return .variable(name: name)
        } else if remove(prefix: "<") {
            let name = try parseTagName()
            guard remove(prefix: ">") else { fatalError() }
            let closingTag = "</\(name)>"
            var body: [Expression] = []
            while !remove(prefix: closingTag)  {
                body.append(try parse())
            }
            return .tag(name: name, body: body)
        } else {
            fatalError("Unexpected token")
        }
    }
    // ...
}

30:39 And that's all we need to do to make the test pass. We can even make it more interesting by having multiple expressions inside the outer tag:

final class ParserTests: XCTestCase {
    // ...
    func testTagBody() throws {
        let input = "<p><span>{ foo }</span><div></div></p>"
        XCTAssertEqual(try input.parse(), .tag(name: "p", body: [
            .tag(name: "span", body: [
                .variable(name: "foo")
            ]),
            .tag(name: "div")
        ]))
    }
    // ...
}

31:10 The next step is making sure we report proper errors. As it is now, we hit a fatal error inside the parser's code whenever something unexpected happens, and this makes it difficult to see what's wrong with the input string.

Resources

  • Sample Code

    Written in Swift 5

  • Episode Video

    Become a subscriber to download episode videos.

In Collection

See All Collections

Episode Details

Recent Episodes

See All

Unlock Full Access

Subscribe to Swift Talk

  • Watch All Episodes

    A new episode every week

  • icon-benefit-download Created with Sketch.

    Download Episodes

    Take Swift Talk with you when you're offline

  • Support Us

    With your help we can keep producing new episodes