Conforming IndexSet to Collection

In this episode16:52

This episode is freely available thanks to the support of our subscribers

Subscribers get exclusive access to new and all previous subscriber-only episodes, video downloads, and 30% discount for team members. Become a Subscriber →

To conform IndexSet to the Collection protocol we implement a custom index type along the way.

00:06 In this episode, we'll continue adding functionality to IndexSet by conforming to more protocols. By conforming to Sequence in the last episode, we were able to iterate over the values, and we gained helper methods like map, filter, and contains.

Now we want to make our IndexSet conform to Collection as well, which will look similar to a lot of stuff we did when we implemented SortedArray, which was our first example of a custom collection. This time, it'll be more complicated because IndexSet doesn't have a simple index type like Int. We need a new index type that allows us to index into the ranges and consequently into an element of a range.

01:37 We create an Index struct nested inside IndexSet. It has two properties: an index for the range, and one for the element in that range, for which we use the index type of episode #38's RangeType:

extension IndexSet {
	struct Index {
		let rangeIndex: Int
		let elementIndex: RangeType.Index
	}
}

To conform to the Collection protocol, we have to implement startIndex, endIndex, subscript, and index(after:).

Creating startIndex and endIndex

02:22 The startIndex is the index of the first element of the first range, so its rangeIndex will simply be the startIndex of ranges, and the elementIndex is the first range's startIndex. However, this is an optional because there could be no ranges at all. In such a case, we return a zero for elementIndex, which isn't an integer but rather a RangeType.Index. For this dummy zero index, we return the startIndex of a fake range:

var startIndex: Index {
	let zero = (0...0).startIndex
	return Index(rangeIndex: ranges.startIndex, elementIndex: ranges.first)?.startIndex ?? zero
}

03:40 The endIndex is somewhat similar but perhaps less straightforward. It has to meet three conditions:

The endIndex must be out of bounds.
The endIndex must be equal to startIndex if the collection is empty.
All valid indices of the collection should compare as less than the endIndex.

For the rangeIndex, we can return the endIndex of ranges, which already places the entire index out of bounds. For elementIndex, we can use the same dummy value as before, which satisfies the second condition:

var endIndex: Index {
	let zero = (0...0).startIndex
	return Index(rangeIndex: ranges.endIndex, elementIndex: zero)
}

Comparable

04:59 To meet the third condition, the index type needs to be Comparable. We make our Index conform to this protocol in another extension by writing the functions == and <. The equals function is pretty easy; it's true if both subindices are equal:

extension IndexSet.Index: Comparable {
	static func ==(lhs: IndexSet.Index, rhs: IndexSet.Index) -> Bool {
		return lhs.rangeIndex == rhs.rangeIndex && lhs.elementIndex == rhs.elementIndex
	}
	
	// ...
}

The less than comparison has some more cases to consider. Firstly, we can compare the rangeIndex of both sides — if they compare less than, we can already return true. If they compare equal, we're in the same range and we can return the comparison of the two sides' elementIndex. In all other cases, we can return false:

extension IndexSet.Index: Comparable {
	// ...
	
	static func <(lhs: IndexSet.Index, rhs: IndexSet.Index) -> Bool {
		if lhs.rangeIndex < rhs.rangeIndex { return true }
		if lhs.rangeIndex == rhs.rangeIndex {
			return lhs.elementIndex < rhs.elementIndex
		}
		return false
	}
}

Subscript

07:38 The subscript function takes an Index and returns the collection's element type: Int. We can simply return the corresponding value without checking the bounds. This means we can cause a crash, but that's actually the expected behavior:

subscript(index: Index) -> Int {
	return ranges[index.rangeIndex][index.elementIndex]
}

Advancing with index(after:)

08:27 The tricky part is index(after:), in which we have to return the next index. Again, there are a few cases: we have to return either a next index within the same range or the first index of a next range, or we should return endIndex, which signifies the end of the collection.

First we'll take the range of the given Index. If the range's next index is still in bounds, we can return it:

func index(after index: Index) -> Index {
	let range = ranges[index.rangeIndex]
	let nextElementIndex = range.index(after: index.elementIndex)
	if nextElement < range.endIndex {
		return Index(rangeIndex: index.rangeIndex, elementIndex: nextElementIndex)
	}
	// ...
}

Otherwise, we try to get the next range's index, which could be within or out of bounds. If it's within bounds, we return the first index of that range. Otherwise, we can return our endIndex:

func index(after index: Index) -> Index {
	// ...
	let nextRangeIndex = ranges.index(after: index.rangeIndex)
	if nextRangeIndex < ranges.endIndex {
		let nextRange = ranges[nextRangeIndex]
		return Index(rangeIndex: nextRangeIndex, elementIndex: nextRange.startIndex)
	} else {
		return endIndex
	}
}

11:35 That's a lot of typing, and it's easy to make mistakes here. Plus, writing an integer-based collection would be less complicated than our compound index type, so the question is how much we've gained by doing it this way. For example, we can try to use subscript with startIndex to get the first index element out:

var set = IndexSet()
set.insert(4...5)
set.insert(0...2)

let idx = set.startIndex
set[idx] // returns 0

12:31 We have to remember to actually declare our Collection conformance:

extension IndexSet: Collection {
	// ...
}

Choosing Protocols

13:04 By implementing Collection on top of Sequence, we didn't really gain a lot of extra functionality. We can count:

set.count // returns 5

But for other methods, like last, we'd have to also implement BidirectionalCollection by adding index(before:).

14:08 You could look at conforming to protocols as a statement about your type, and not just as a means to gain functionality. By implementing Collection, you're saying it's OK to iterate over this type multiple times. With a Sequence, you might only be able to iterate over it once, e.g. standard input. Our IndexSet can be iterated over as many times as we want, so making it into a Collection makes more sense.

14:39 Also, a Collection promises that using subscript takes a constant time, which, in this case, is a requirement we can fulfill. Array, on the other hand, also implements RandomAccessCollection, which we could technically implement, but the protocol has the soft requirement that you can perform count in constant time. We can't promise that because our type has different characteristics. As such, we shouldn't conform to that protocol.

15:17 Another important aspect to note when working with indices is the fact that stored indices might become invalid when you mutate the collection. For example, if you remove an element from an IndexSet, its range could split into two ranges. The same goes for Arrays: if you remove an element, all the indices after that element will point to different elements or they'll be out of bounds.

16:21 So that's it for our custom collection type. Implementing these protocols turned out to be not much work. If you have a data type that works like a collection, it could be worth the effort to improve readability or encapsulate functionality.