Splitting Arrays

In this episode24:07

This episode is freely available thanks to the support of our subscribers

Subscribers get exclusive access to new and all previous subscriber-only episodes, video downloads, and 30% discount for team members. Become a Subscriber →

We talk about a familiar but surprisingly tricky problem: splitting an array into groups of elements. We discuss the pros and cons of our own solutions along with the solutions people sent us via Twitter!

00:06 Today let's look at splitting arrays, a problem we encountered when building a tool for ourselves. A playground contains two kinds of things: documentation and code. Therefore, we have to separate all the code blocks from the rest of our Markdown and build up an array where the code is split from the rest.

00:51 The key issue is that we want to group all the consecutive non-code blocks in the Markdown and not have separate array entries for each block.

01:06 It sounds like an easy problem to solve, but it turned out to be trickier than expected. We weren't happy with any of our approaches, which all felt a bit hacky. So we asked for suggestions on Twitter, and some people replied with really good ideas. We'll work our way through them by refactoring our solution, and we'll end up at some of the solutions people suggested.

01:35 To start, we have two enums defined here. First is the Block enum, which we used in the episodes on Rendering CommonMark and which represents a block-level element in Markdown. It has cases for code, text, and headers, but for the sake of simplicity, we left out all the other cases. Second, we have a PlaygroundElement enum, and this has two cases — either something is code or it's documentation:

enum Block: Equatable {
    case code(String)
    case text(String)
    case header(String)
    // ...
}

enum PlaygroundElement: Equatable {
    case code(String)
    case documentation([Block])
}

02:11 Then we have a function, playground, which takes an array of Markdown blocks and should return an array of PlaygroundElements:

func playground(blocks: [Block]) -> [PlaygroundElement] {
    return []
}

02:24 We already created a test, which has a sample input and an expected result. We can use it to verify our implementation:

class SplittingArraysTests: XCTestCase {
    func testPlayground() {
        let sample: [Block] = [
            .code("swift sample code"),
            .header("My Header"),
            .text("Hello"),
            .code("more")
        ]
        
        let result: [PlaygroundElement] = [
            .code("swift sample code"),
            .documentation([Block.header("My Header"), Block.text("Hello")]),
            .code("more")
        ]
        AssertEqual(playground(blocks: sample), result)
    }
}

A First Attempt

02:47 Running the tests yields an error because we haven't yet implemented anything. So we start with a result variable. Then we iterate over the blocks and switch on each block within the loop's body. If we encounter a .code block, we append a .code playground element. If we're dealing with any other Markdown block, the default case, we simply append that block by wrapping it in a documentation element. Finally, we return the result:

func playground(blocks: [Block]) -> [PlaygroundElement] {
    var result: [PlaygroundElement] = []
    for block in blocks {
        switch block {
        case .code(let code):
            result.append(.code(code))
        default:
            result.append(.documentation([block]))
        }
    }
    return result
}

04:19 The tests still fail because our code doesn't yet group consecutive non-code blocks. The output of the tests is really hard to read though, so let's write a little test helper that generates more readable output.

04:45 We'll provide a custom variant of the AssertEqual helper. This is generic over A, and A needs to be Equatable so that we can compare two values. The helper has two parameters: one for each array we want to compare. In the helper, we can generate a custom message. There's also a function called dump, which produces structured output. By default, dump prints to the standard output, but it has a second parameter of type TextOutputStream. The String type implements the TextOutputStream protocol.

05:56 We can initialize a String variable and pass it in as the second parameter to dump and build up our message this way. Then we append newlines and append the dump of the second parameter. Finally, we call XCTAssertEqual with the two parameters and our custom message:

func AssertEqual<A: Equatable>(_ l: [A], _ r: [A]) {
    var message = ""
    dump(l, to: &message)
    message += "\n\nis not equal to:\n\n"
    dump(r, to: &message)
    XCTAssertEqual(l, r, message)
}

06:42 Once we run our tests, we can see that the console output has improved a lot. We can also see that we have two consecutive documentation elements in our output that should be grouped together.

07:27 While our test helper makes the output more readable, it introduces another problem: the error message now shows up in the helper function and not at the point where we call it in the test. We can fix this by adding two parameters: a file parameter with a default value of #file, and a line parameter with a default value of #line. When our function gets called, #file and #line get replaced with the file name and the line number of the caller. Likewise, for the line number, we can use #line as the default value and this will get the value of the line number of the caller. We can pass it on to XCTAssertEqual, and now the test failure gets reported at the correct location:

func AssertEqual<A: Equatable>(_ l: [A], _ r: [A], file: StaticString = #file, line: UInt = #line) {
    var message = ""
    dump(l, to: &message)
    message += "\n\nis not equal to:\n\n"
    dump(r, to: &message)
    XCTAssertEqual(l, r, message, file: file, line: line)
}

08:24 Back to our implementation: now we'll get into the details of our original problem. First, we'll create a temporary array where we accumulate non-code blocks. We'll start with an empty array, and whenever we encounter a non-code block, we can just append it to the temporary array. Once we hit a code block, we check if the nonCodeBlocks array isn't empty. If this check succeeds, we insert the contents of nonCodeBlocks into the result array as a single .documentation element. Then we also have to remember to flush the nonCodeBlocks array by setting it to an empty array:

func playground(blocks: [Block]) -> [PlaygroundElement] {
    var result: [PlaygroundElement] = []
    var nonCodeBlocks: [Block] = []
    for block in blocks {
        switch block {
        case .code(let code):
            if !nonCodeBlocks.isEmpty {
                result.append(.documentation(nonCodeBlocks))
                nonCodeBlocks = []
            }
            result.append(.code(code))
        default:
            nonCodeBlocks.append(block)
        }
    }
    return result
}

10:23 Our code isn't correct yet. We'll duplicate our original test and modify it by removing the trailing code block in the input and the expected result. Now we can see that the test fails and we can fix our code. If the last element in our input array isn't a code block, we also need to flush our nonCodeBlocks array. We can simply copy-paste our code from before and our tests will succeed:

func playground(blocks: [Block]) -> [PlaygroundElement] {
    var result: [PlaygroundElement] = []
    var nonCodeBlocks: [Block] = []
    for block in blocks {
        switch block {
        case .code(let code):
            if !nonCodeBlocks.isEmpty {
                result.append(.documentation(nonCodeBlocks))
                nonCodeBlocks = []
            }
            result.append(.code(code))
        default:
            nonCodeBlocks.append(block)
        }
    }
    if !nonCodeBlocks.isEmpty {
        result.append(.documentation(nonCodeBlocks))
        nonCodeBlocks = []
    }
    return result
}

A Working Solution

11:31 We can abstract this and put the code into a function called flush. This was suggested by Nick Lockwood, and it happened to be our original solution too:

func playground(blocks: [Block]) -> [PlaygroundElement] {
    var result: [PlaygroundElement] = []
    var nonCodeBlocks: [Block] = []
    func flush() {
        if !nonCodeBlocks.isEmpty {
            result.append(.documentation(nonCodeBlocks))
            nonCodeBlocks = []
        }
    }
    for block in blocks {
        switch block {
        case .code(let code):
            flush()
            result.append(.code(code))
        default:
            nonCodeBlocks.append(block)
        }
    }
    flush()
    return result
}

12:06 But this solution feels too complicated. We have to remember to call flush in two places, and we have to remember to empty the array after appending. So because we couldn't come up with better solutions at first, we asked on Twitter.

12:47 So far, our approach has been to look at each Markdown block in the input and build up an intermediate array. Instead of the intermediate array, we could check the last element of our result array and mutate that if possible. If there's a documentation element there, we can mutate it and append to its array.

No More `flush` Function

13:38 Now we don't need the nonCodeBlocks array or the flush function. We can remove the calls to flush as well and leave the .code case as is. In the default case, we can check if the last element in the result is a .documentation element. When we have no .documentation case at the end of the array, we can simply append a new .documentation value with an array containing the block. Otherwise, we have to remove the last element of the array and replace it with a new .documentation case that contains the existing blocks together with the new block:

func playground(blocks: [Block]) -> [PlaygroundElement] {
    var result: [PlaygroundElement] = []
    for block in blocks {
        switch block {
        case .code(let code):
            result.append(.code(code))
        default:
            if case let .documentation(existingBlocks)? = result.last {
                result.removeLast()
                result.append(.documentation(existingBlocks + [block]))
            } else {
                result.append(.documentation([block]))
            }
        }
    }
    return result
}

15:41 This code works, but there's still some room for improvement, depending on taste. For example, we can move the append out of the if statement:

func playground(blocks: [Block]) -> [PlaygroundElement] {
    var result: [PlaygroundElement] = []
    for block in blocks {
        switch block {
        case .code(let code):
            result.append(.code(code))
        default:
            var newBlocks = [block]
            if case let .documentation(existingBlocks)? = result.last {
                result.removeLast()
                newBlocks = existingBlocks + newBlocks
            }
            result.append(.documentation(newBlocks))
        }
    }
    return result
}

16:55 We can also remove the switch statement and use an if case let instead, but that's a matter of taste:

func playground(blocks: [Block]) -> [PlaygroundElement] {
    var result: [PlaygroundElement] = []
    for block in blocks {
        if case let .code(code) = block {
            result.append(.code(code))
        } else {
            var newBlocks = [block]
            if case let .documentation(existingBlocks)? = result.last {
                result.removeLast()
                newBlocks = existingBlocks + newBlocks
            }
            result.append(.documentation(newBlocks))
        }
    }
    return result
}

17:37 The nice thing about this approach is that the complexity is only in one place — where we call removeLast() — which is better than having complexity in two places. There's yet another approach, where we switch on the combination of the current block and the last element in the results array.

18:12 In case of the code block, we're not interested in the last element, and we simply append to the result. In case we have a non-code block and a .documentation element, we replace the last element in the array. In case we have a non-code block and another element, we simply create a new .documentation element:

func playground(blocks: [Block]) -> [PlaygroundElement] {
    var result: [PlaygroundElement] = []
    for block in blocks {
        switch (block, result.last) {
        case let (.code(code), _):
            result.append(.code(code))
        case let (_, .documentation(existing)?):
            result.removeLast()
            result.append(.documentation(existing + [block]))
        default:
            result.append(.documentation([block]))
        }
    }
    return result
}

20:43 The tests still succeed, so the code is correct.

What we like about this approach is that all three cases that have to be handled are spelled out underneath one another. On the flip side, it's not so nice that we access result.last unnecessarily. Also, we have to keep in mind that the order of the case statements is important.

This solution is very close to a suggestion we got on GitHub by odnoletkov, who took a more functional approach without a for loop.

A Function Version

21:54 We can remove the result variable and reduce on the blocks array with an empty array as the initial value. Instead of appending to the results array, we have to return a new array with the element appended to it. Instead of using removeLast(), we use dropLast(). The tests still succeed. This approach is nice as well; even though there might be performance differences, it still works the same way:

func playground_finished(blocks: [Block]) -> [PlaygroundElement] {
    return blocks.reduce([]) { result, block in
        switch (block, result.last) {
        case let (.code(code), _):
            return result + [.code(code)]
        case let (_, .documentation(existing)?):
            return result.dropLast() + [.documentation(existing + [block])]
        default:
            return result + [.documentation([block])]
        }
    }
}

23:36 We wonder if there's still a simpler solution out there, and we encourage everyone who knows of one to send it to us. We'll discuss this in an upcoming episode, unless, of course, we've reached the inherent complexity of the problem.