Building a Custom IndexSet Collection

In this episode18:52

This episode is freely available thanks to the support of our subscribers

Subscribers get exclusive access to new and all previous subscriber-only episodes, video downloads, and 30% discount for team members. Become a Subscriber →

We build the basics for a custom index set collection type.

00:06 Today we'll begin building a naive implementation of IndexSet. This type already exists in Foundation, but it serves as a good example for building a more complex custom collection type. Because of the scope of this project, we'll do it over the course of a few episodes. First, we'll just build the type itself, and in future episodes, we'll add more functionality by conforming to the Sequence and Collection protocols.

00:43 An IndexSet is a type where you can store indices. However, the indices are stored internally as ranges of indices. It's a very efficient implementation: often you're dealing with adjacent indices — for example, when you want to store the indices of a table view that are currently visible. Instead of storing the indices individually, IndexSet stores a single range.

01:09 We won't implement IndexSet in the most time-efficient way, but it will be space efficient. We'll be using a test-driven approach — not to exhaustively test all possible cases, but to better show you how we expect the data structure to work.

Creating the IndexSet Struct

01:51 We create an IndexSet struct, and we store sortedRanges as a property inside the struct. As an invariant, the contents of that array will be sorted by the lowerBound:

struct IndexSet {
    // Invariant: sorted by lower bound
    var sortedRanges: [ClosedRange<Int>] = []
}

02:21 A closed range is written with three dots (e.g. 1...10), and the value on the right-hand side of the dots is included in the range. We add an insert method, which takes a closed range and inserts it. We start by appending the element, and we also introduce a typealias for the range:

struct IndexSet {
    typealias RangeType = ClosedRange<Int>
    // Invariant: sorted by lower bound
    var sortedRanges: [RangeType] = []
    
    mutating func insert(_ insertedRange: RangeType) {
        sortedRanges.append(insertedRange)
    }
}

Implementing Insertion

03:06 Now we write a failing test for insertion: we start with an empty IndexSet, and then we insert two ranges. We expect the ranges to be sorted by their lower bounds, but that currently doesn't work:

func testInsertion() {
    var set = IndexSet()
    set.insert(5...6)
    set.insert(1...2)
    XCTAssert(set.sortedRanges == [1...2, 5...6])
}

04:28 To make the test pass, we use the simplest solution to keep the ranges sorted: we simply call sort on the sortedRanges array after appending. Then we can specify that the ranges should be sorted by lowerBound. A more efficient implementation would insert the new range at the correct point to begin with. But since we ultimately want to focus on conforming to collection protocols, we use the naive implementation, as this gets us to the protocol part sooner:

mutating func insert(_ element: RangeType) {
    sortedRanges.append(element)
    sortedRanges.sort { $0.lowerBound < $1.lowerBound }
}

05:23 Now the test for insertion passes. Next, we add another failing test: testMerging. When we insert two overlapping ranges, the IndexSet shouldn't contain both ranges. Instead, they should be merged:

func testMerging() {
    var set = IndexSet()
    set.insert(5...6)
    set.insert(3...6)
    XCTAssert(set.sortedRanges == [3...6])
}

06:17 We can make this test pass by adding a new private method, merge, which we call from IndexSet:

mutating func insert(_ element: RangeType) {
    sortedRanges.append(element)
    sortedRanges.sort { $0.lowerBound < $1.lowerBound }
    merge()
}

06:30 To implement merge, we have to iterate over all the ranges, and for each range, check if it overlaps with the next range. If it overlaps, we replace the overlapping ranges with one unified range. However, we can't simply loop over sortedArray.indices, because when we change the array, the indices become invalid.

07:28 To avoid the problem of invalid indices, we use reduce for our implementation. We start with an empty array, and the combine function takes two parameters: the new ranges, and the range we're currently inspecting. If newRanges contains a last element, and if that element overlaps with the current range, then we have to replace the last element with a new range merging the two. To be able to mutate newRanges, we have to copy it into a var:

private mutating func merge() {
    sortedRanges = sortedRanges.reduce([]) { (newRanges: [RangeType], range) in
        var copy = newRanges
        if let last = newRanges.last, last.overlaps(range) {
            copy[newRanges.endIndex-1] = last.merge(range)
        } else {
            copy.append(range)
        }
        return copy
    }
}

09:38 To make this work, we still have to define the merge method on ClosedRange. This method simply returns a new range that reaches from the minimum of the lower bounds up to the maximum of the upper bounds:

extension ClosedRange {
    func merge(_ other: ClosedRange) -> ClosedRange {
        return min(lowerBound, other.lowerBound)...max(upperBound, other.upperBound)
    }
}

10:32 Now our tests pass again, and we add a final failing test, which tests merging adjacent ranges: inserting 3...4 and 5...6 should result in a single range, 3...6. In other words, adjacent ranges (with no element missing in between) should be merged as well:

func testMergingAdjacent() {
    var set = IndexSet()
    set.insert(5...6)
    set.insert(3...4)
    XCTAssert(set.sortedRanges == [3...6])
}

11:33 To make this test pass, we replace the call to overlaps in IndexSet's merge method with a call to overlapsOrAdjacent, which we still have to define. To write overlapsOrAdjacent, we have to compute with the ranges' bounds: at some point we have to say something like lowerBounds - 1 or upperBounds + 1 to check whether or not the two ranges are adjacent. However, the bounds of a ClosedRange are only Comparable, i.e. we can't compute with them at all.

Using CountableClosedRange

12:34 To solve this issue, we have to switch from using ClosedRange to using CountableClosedRange. A CountableClosedRange's bounds are Strideable, and the stride has to be a SignedInteger so we can calculate with the bounds. When we change our extension from ClosedRange to CountableClosedRange, we also have to replace our usage of min and max in the implementation of merge with Swift.min and Swift.max; CountableClosedRange is a collection, and collections have min and max methods as well, whereas we want to refer to the global min and max functions:

extension CountableClosedRange {
    func merge(_ other: CountableClosedRange) -> CountableClosedRange {
        return Swift.min(lowerBound, other.lowerBound)...Swift.max(upperBound, other.upperBound)
    }
}

13:22 Now we can implement overlapsOrAdjacent. We first extend self on both bounds by one. Then we can simply check if the two ranges overlap:

extension CountableClosedRange {
    func overlapsOrAdjacent(_ other: CountableClosedRange) -> Bool {
        return ((lowerBound.advanced(by: -1))...(upperBound.advanced(by: 1))).overlaps(other)
    }
}

14:34 We also still have to change our RangeType type alias in the IndexSet to CountableClosedRange:

typealias RangeType = CountableClosedRange<Int>

Reduce with inout

14:50 All the tests pass now. Before we continue, we first refactor the current code. Our merge method on IndexSet uses reduce, and within the function we pass to reduce, we have to make a mutable copy of newRanges that we can change. To avoid making this copy, we add a new version of reduce that takes an inout parameter.

15:39 First, we write our new version of reduce as an extension on Array. In the regular reduce definition, combine is of type (A, Element) -> A. We want to move the return value into an inout parameter so that the type of combine will be (inout A, Element) -> ():

extension Array {
    func reduce<A>(_ initial: A, combine: (inout A, Element) -> ()) -> A {
        var result = initial
        for element in self {
            combine(&result, element)
        }
        return result
    }
}

17:17 To make it clear to the compiler that we want to use this version of reduce in the merge method, we have to add a type to the parameter list. Then we can get rid of the mutable intermediate array and the return statement:

    private mutating func merge() {
        sortedRanges = sortedRanges.reduce([]) { (newRanges: inout [RangeType], range) in
            if let last = newRanges.last, last.overlapsOrAdjacent(range) {
                newRanges[newRanges.endIndex-1] = last.merge(range)
            } else {
                newRanges.append(range)
            }
        }
    }

17:55 Finally, we can make the inout version of reduce available on all sequences. To accomplish this, we only have to change Element to Iterator.Element:

extension Sequence {
    func reduce<A>(_ initial: A, combine: (inout A, Iterator.Element) -> ()) -> A {
        var result = initial
        for element in self {
            combine(&result, element)
        }
        return result
    }
}

18:26 Now that we have the basics of our IndexSet type in place, in future episodes we'll look at how to make it conform to Sequence and Collection.