00:06 Today let's look at splitting arrays, a problem we encountered when
building a tool for ourselves. A playground contains two kinds of things:
documentation and code. Therefore, we have to separate all the code blocks from
the rest of our Markdown and build up an array where the code is split from the
rest.
00:51 The key issue is that we want to group all the consecutive non-code
blocks in the Markdown and not have separate array entries for each block.
01:06 It sounds like an easy problem to solve, but it turned out to be
trickier than expected. We weren't happy with any of our approaches, which all
felt a bit hacky. So we asked for suggestions on
Twitter, and some
people replied with really good ideas. We'll work our way through them by
refactoring our solution, and we'll end up at some of the solutions people
suggested.
01:35 To start, we have two enums defined here. First is the Block
enum, which we used in the episodes on Rendering
CommonMark and which
represents a block-level element in Markdown. It has cases for code, text, and
headers, but for the sake of simplicity, we left out all the other cases.
Second, we have a PlaygroundElement
enum, and this has two cases — either
something is code or it's documentation:
enum Block: Equatable {
case code(String)
case text(String)
case header(String)
}
enum PlaygroundElement: Equatable {
case code(String)
case documentation([Block])
}
02:11 Then we have a function, playground
, which takes an array of
Markdown
blocks and should return an array of PlaygroundElement
s:
func playground(blocks: [Block]) -> [PlaygroundElement] {
return []
}
02:24 We already created a test, which has a sample input and an
expected result. We can use it to verify our implementation:
class SplittingArraysTests: XCTestCase {
func testPlayground() {
let sample: [Block] = [
.code("swift sample code"),
.header("My Header"),
.text("Hello"),
.code("more")
]
let result: [PlaygroundElement] = [
.code("swift sample code"),
.documentation([Block.header("My Header"), Block.text("Hello")]),
.code("more")
]
AssertEqual(playground(blocks: sample), result)
}
}
A First Attempt
02:47 Running the tests yields an error because we haven't yet
implemented anything. So we start with a result
variable. Then we iterate over
the blocks and switch on each block within the loop's body. If we encounter a
.code
block, we append a .code
playground element. If we're dealing with any
other Markdown block, the default
case, we simply append that block by
wrapping it in a documentation element. Finally, we return the result:
func playground(blocks: [Block]) -> [PlaygroundElement] {
var result: [PlaygroundElement] = []
for block in blocks {
switch block {
case .code(let code):
result.append(.code(code))
default:
result.append(.documentation([block]))
}
}
return result
}
04:19 The tests still fail because our code doesn't yet group
consecutive non-code blocks. The output of the tests is really hard to read
though, so let's write a little test helper that generates more readable output.
04:45 We'll provide a custom variant of the AssertEqual
helper. This
is generic over A
, and A
needs to be Equatable
so that we can compare two
values. The helper has two parameters: one for each array we want to compare. In
the helper, we can generate a custom message. There's also a function called
dump
, which produces structured output. By default, dump
prints to the
standard output, but it has a second parameter of type TextOutputStream
. The
String
type implements the TextOutputStream
protocol.
05:56 We can initialize a String
variable and pass it in as the second
parameter to dump
and build up our message this way. Then we append newlines
and append the dump
of the second parameter. Finally, we call XCTAssertEqual
with the two parameters and our custom message:
func AssertEqual<A: Equatable>(_ l: [A], _ r: [A]) {
var message = ""
dump(l, to: &message)
message += "\n\nis not equal to:\n\n"
dump(r, to: &message)
XCTAssertEqual(l, r, message)
}
06:42 Once we run our tests, we can see that the console output has
improved a lot. We can also see that we have two consecutive documentation
elements in our output that should be grouped together.
07:27 While our test helper makes the output more readable, it
introduces another problem: the error message now shows up in the helper
function and not at the point where we call it in the test. We can fix this by
adding two parameters: a file
parameter with a default value of #file
, and a
line
parameter with a default value of #line
. When our function gets called,
#file
and #line
get replaced with the file name and the line number of the
caller. Likewise, for the line number, we can use #line
as the default value
and this will get the value of the line number of the caller. We can pass it on
to XCTAssertEqual
, and now the test failure gets reported at the correct
location:
func AssertEqual<A: Equatable>(_ l: [A], _ r: [A], file: StaticString = #file, line: UInt = #line) {
var message = ""
dump(l, to: &message)
message += "\n\nis not equal to:\n\n"
dump(r, to: &message)
XCTAssertEqual(l, r, message, file: file, line: line)
}
08:24 Back to our implementation: now we'll get into the details of our
original problem. First, we'll create a temporary array where we accumulate
non-code blocks. We'll start with an empty array, and whenever we encounter a
non-code block, we can just append it to the temporary array. Once we hit a code
block, we check if the nonCodeBlocks
array isn't empty. If this check
succeeds, we insert the contents of nonCodeBlocks
into the result
array as a
single .documentation
element. Then we also have to remember to flush the
nonCodeBlocks
array by setting it to an empty array:
func playground(blocks: [Block]) -> [PlaygroundElement] {
var result: [PlaygroundElement] = []
var nonCodeBlocks: [Block] = []
for block in blocks {
switch block {
case .code(let code):
if !nonCodeBlocks.isEmpty {
result.append(.documentation(nonCodeBlocks))
nonCodeBlocks = []
}
result.append(.code(code))
default:
nonCodeBlocks.append(block)
}
}
return result
}
10:23 Our code isn't correct yet. We'll duplicate our original test and
modify it by removing the trailing code block in the input and the expected
result. Now we can see that the test fails and we can fix our code. If the last
element in our input array isn't a code block, we also need to flush our
nonCodeBlocks
array. We can simply copy-paste our code from before and our
tests will succeed:
func playground(blocks: [Block]) -> [PlaygroundElement] {
var result: [PlaygroundElement] = []
var nonCodeBlocks: [Block] = []
for block in blocks {
switch block {
case .code(let code):
if !nonCodeBlocks.isEmpty {
result.append(.documentation(nonCodeBlocks))
nonCodeBlocks = []
}
result.append(.code(code))
default:
nonCodeBlocks.append(block)
}
}
if !nonCodeBlocks.isEmpty {
result.append(.documentation(nonCodeBlocks))
nonCodeBlocks = []
}
return result
}
A Working Solution
11:31 We can abstract this and put the code into a function called
flush
. This was suggested by Nick
Lockwood, and it
happened to be our original solution too:
func playground(blocks: [Block]) -> [PlaygroundElement] {
var result: [PlaygroundElement] = []
var nonCodeBlocks: [Block] = []
func flush() {
if !nonCodeBlocks.isEmpty {
result.append(.documentation(nonCodeBlocks))
nonCodeBlocks = []
}
}
for block in blocks {
switch block {
case .code(let code):
flush()
result.append(.code(code))
default:
nonCodeBlocks.append(block)
}
}
flush()
return result
}
12:06 But this solution feels too complicated. We have to remember to
call flush
in two places, and we have to remember to empty the array after
appending. So because we couldn't come up with better solutions at first, we
asked on Twitter.
12:47 So far, our approach has been to look at each Markdown block in
the input and build up an intermediate array. Instead of the intermediate array,
we could check the last element of our result array and mutate that if possible.
If there's a documentation element there, we can mutate it and append to its
array.
No More flush
Function
13:38 Now we don't need the nonCodeBlocks
array or the flush
function. We can remove the calls to flush
as well and leave the .code
case
as is. In the default
case, we can check if the last element in the result is
a .documentation
element. When we have no .documentation
case at the end of
the array, we can simply append a new .documentation
value with an array
containing the block. Otherwise, we have to remove the last element of the array
and replace it with a new .documentation
case that contains the existing
blocks together with the new block:
func playground(blocks: [Block]) -> [PlaygroundElement] {
var result: [PlaygroundElement] = []
for block in blocks {
switch block {
case .code(let code):
result.append(.code(code))
default:
if case let .documentation(existingBlocks)? = result.last {
result.removeLast()
result.append(.documentation(existingBlocks + [block]))
} else {
result.append(.documentation([block]))
}
}
}
return result
}
15:41 This code works, but there's still some room for improvement,
depending on taste. For example, we can move the append
out of the if
statement:
func playground(blocks: [Block]) -> [PlaygroundElement] {
var result: [PlaygroundElement] = []
for block in blocks {
switch block {
case .code(let code):
result.append(.code(code))
default:
var newBlocks = [block]
if case let .documentation(existingBlocks)? = result.last {
result.removeLast()
newBlocks = existingBlocks + newBlocks
}
result.append(.documentation(newBlocks))
}
}
return result
}
16:55 We can also remove the switch statement and use an if case let
instead, but that's a matter of taste:
func playground(blocks: [Block]) -> [PlaygroundElement] {
var result: [PlaygroundElement] = []
for block in blocks {
if case let .code(code) = block {
result.append(.code(code))
} else {
var newBlocks = [block]
if case let .documentation(existingBlocks)? = result.last {
result.removeLast()
newBlocks = existingBlocks + newBlocks
}
result.append(.documentation(newBlocks))
}
}
return result
}
17:37 The nice thing about this approach is that the complexity is only
in one place — where we call removeLast()
— which is better than having
complexity in two places. There's yet another approach, where we switch on the
combination of the current block and the last element in the results
array.
18:12 In case of the code block, we're not interested in the last
element, and we simply append to the result. In case we have a non-code block
and a .documentation
element, we replace the last element in the array. In
case we have a non-code block and another element, we simply create a new
.documentation
element:
func playground(blocks: [Block]) -> [PlaygroundElement] {
var result: [PlaygroundElement] = []
for block in blocks {
switch (block, result.last) {
case let (.code(code), _):
result.append(.code(code))
case let (_, .documentation(existing)?):
result.removeLast()
result.append(.documentation(existing + [block]))
default:
result.append(.documentation([block]))
}
}
return result
}
20:43 The tests still succeed, so the code is correct.
What we like about this approach is that all three cases that have to be handled
are spelled out underneath one another. On the flip side, it's not so nice that
we access result.last
unnecessarily. Also, we have to keep in mind that the
order of the case
statements is important.
This solution is very close to a suggestion we got on GitHub by
odnoletkov,
who took a more functional approach without a for loop.
A Function Version
21:54 We can remove the result
variable and reduce on the blocks
array with an empty array as the initial value. Instead of appending to the
results
array, we have to return a new array with the element appended to it.
Instead of using removeLast()
, we use dropLast()
. The tests still succeed.
This approach is nice as well; even though there might be performance
differences, it still works the same way:
func playground_finished(blocks: [Block]) -> [PlaygroundElement] {
return blocks.reduce([]) { result, block in
switch (block, result.last) {
case let (.code(code), _):
return result + [.code(code)]
case let (_, .documentation(existing)?):
return result.dropLast() + [.documentation(existing + [block])]
default:
return result + [.documentation([block])]
}
}
}
23:36 We wonder if there's still a simpler solution out there, and we
encourage everyone who knows of one to send it to us. We'll discuss this in an
upcoming episode, unless, of course, we've reached the inherent complexity of
the problem.