This episode is freely available thanks to the support of our subscribers

Subscribers get exclusive access to new and all previous subscriber-only episodes, video downloads, and 30% discount for team members. Become a Subscriber

We use Swift's pointer APIs to read a text file and split it into lines without using Swift's collection and string types.

00:06 Today we'll have some fun with Swift's pointer APIs. We generally only have to deal with pointers when working with C libraries, though in some special cases, we might want to use them for the raw performance. As an exercise to see what working with pointers looks like, we're going to read data from a text file and split the text into lines, all without using Swift arrays or strings.

Swift Version

00:49 But first, we'll implement the assignment by writing it the way we normally would in Swift, which is done with two lines of code: we decode a string from the ASCII file, and we split the string by line endings. Printing the result, we see an array of strings — one string for each line in the file:

let url = FileManager.default.homeDirectoryForCurrentUser.appendingPathComponent("Downloads/test.txt")

let str = try! String(contentsOf: url, encoding: .ascii)
let lines = str.split(separator: "\n")
print(lines)

/*
["Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed", "diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat,", ... ]
*/

Opening and Reading the File

01:48 That was super easy in Swift. Now we'll do the same thing using only pointers and C functions.

The first step is to open the file for reading:

let path = FileManager.default.homeDirectoryForCurrentUser.appendingPathComponent("Downloads/test.txt").path

guard let file = fopen(path, "r") else { fatalError() }

02:50 We want to read to the end of the file, but we don't yet know the length of the file's contents. We use fseek to place the current position at the end of the file, and we store this position as the size of the file, which is the number of bytes. Afterward, we rewind to set the current position back to the beginning of the file:

fseek(file, 0, SEEK_END)
let size = ftell(file)
rewind(file)

03:50 To read the file, we call the function fread, which writes the file contents to an UnsafeMutableRawPointer that we pass in. We create the pointer by allocating the number of bytes we need, and we have to specify how the buffer is aligned in memory. The alignment doesn't matter in our case, so we specify 1, meaning the buffer can be placed anywhere. We also have to pass in the size of items to read (one byte in our case), the number of items (the number of bytes, i.e. size), and the file to read from:

let ptr = UnsafeMutableRawPointer.allocate(bytes: size, alignedTo: 1)
fread(ptr, 1, size, file)

04:58 We check that fread returns the correct number of items, and we close the file:

guard fread(ptr, 1, size, file) == size else { fatalError() }
fclose(file)

05:23 Anytime we open a file, we have to be careful to close it after we're done with it. If we put the file closing in a defer statement, we can move the entire thing to where we open the file, keeping the opening and closing of the file together in code:

guard let file = fopen(path, "r") else { fatalError() }
defer { fclose(file) }
// use `file` here

05:54 Another thing we should pay attention to: whenever we allocate memory, we have to remember to free it, so we add a comment to remind ourselves to deallocate the pointer later:

let ptr = UnsafeMutableRawPointer.allocate(bytes: size, alignedTo: 1)
// TODO dealloc

Splitting into Lines

06:02 Moving on, we want to do something with the UnsafeMutableRawPointer. Currently, there is no information about what's stored in this buffer. By rebinding the memory, we can turn the raw pointer into a typed pointer, UnsafeMutablePointer<CChar>:

let chars = ptr.bindMemory(to: CChar.self, capacity: size)

07:09 We want to end up with something like an array of strings. In C terms, that means a pointer to pointers of CChar. To construct this outer pointer, we need to find out how big it should be — how many lines there are — which we do in two steps. First we iterate over the entire string and count the number of lines, and then we allocate a pointer for each line.

07:40 Compare this procedure to working with a Swift array, where we'd normally create an empty array and then append lines to it. The difference is that now we don't have the ability to append, so we have to create an array with a certain size upfront and subsequently assign values to the array's elements. This is basically what we're doing, but with pointers instead of arrays and strings.

08:03 To iterate over chars, we have to cheat a little bit and use a Swift range, because we no longer have C-style for-loops in Swift:

for idx in 0..<size {
    // ...
}

08:24 To count the lines, we check each character to see if it's a line feed. We can't directly compare a CChar to "\n", so we convert "\n" into a C string, take its first character, and store it as a constant:

let lineFeed = "\n".utf8CString.first!
var lineCount = 0
for idx in 0..<size {
    if chars[idx] == lineFeed {
        lineCount += 1
    }
}
// lineCount: 12

09:35 We could've written the same thing with filter and count, but those aren't allowed in today's challenge.

09:54 Again, we would create an array of strings in Swift. But in C, a string is a pointer of characters, so we want to create a pointer of pointers of characters. In other words, we need to allocate a buffer of memory that will contain 12 pointers, and each of these pointers should point to the buffer containing the actual characters of a line:

let lines = UnsafeMutablePointer<UnsafeMutablePointer<CChar>>.allocate(capacity: lineCount)

11:23 With this buffer in place, we want to loop over all the characters from our file and pull out the lines. We look for line feeds again, this time calculating the position and length of the current line:

var lineOffset = 0
for idx in 0..<size {
    guard chars[idx] == lineFeed else { continue }
    let lineLength = idx - lineOffset

    // ...

    lineOffset = idx + 1
}

13:14 Having found the start position and the length of a line, we want to copy everything in this range into a new buffer that represents the line. C strings are null-terminated, so we increase the capacity of the buffer by one byte, and we set the last byte to 0 through subscript:

var lineOffset = 0
for idx in 0..<size {
    guard chars[idx] == lineFeed else { continue }
    let lineLength = idx - lineOffset

    let line = UnsafeMutablePointer<CChar>.allocate(capacity: lineLength + 1)
    line.initialize(from: chars.advanced(by: lineOffset), count: lineLength)
    line[lineLength] = 0

    // ...

    lineOffset = idx + 1
}

15:06 We assign the newly created line pointer to the lines buffer, for which we need to keep track of a line index:

var lineOffset = 0
var lineIdx = 0
for idx in 0..<size {
    guard chars[idx] == lineFeed else { continue }
    let lineLength = idx - lineOffset

    let line = UnsafeMutablePointer<CChar>.allocate(capacity: lineLength + 1)
    line.initialize(from: chars.advanced(by: lineOffset), count: lineLength)
    line[lineLength] = 0

    lines[lineIdx] = line

    lineIdx += 1
    lineOffset = idx + 1
}

15:52 We add two more reminders: one to deallocate the outer lines pointer, and one for the inner line pointers. But we remove the reminder we wrote before; we can deallocate the chars pointer now that we're done with it:

chars.deallocate(capacity: size)

Printing Lines

16:47 To see if everything worked, we want to loop over the lines and print them. Since printf isn't available in Swift, we have to cheat and convert the CChar pointers to Swift strings in order to print them:

for lineIdx in 0..<lineCount {
    let line = lines[lineIdx]
    let str = String(cString: line)
    print(str)
}

18:19 Thinking about memory management, we know that we no longer need the line pointers after printing the lines. So in the same loop, we can deallocate each line pointer. We have to pass in the capacity again, which is the length of the line, plus one for the terminating zero:

for lineIdx in 0..<lineCount {
    let line = lines[lineIdx]
    let str = String(cString: line)
    print(str)
    line.deallocate(capacity: strlen(line) + 1)
}

19:08 Finally, we deallocate the lines pointer:

lines.deallocate(capacity: lineCount)

Discussion

19:32 That was way more work than the two lines of the Swift example, but it's cool we can do this low-level stuff. And we could go even lower if we wanted with malloc.

19:57 Swift gives us many different levels of abstraction, and it's up to us as developers to decide what to use. The lowest level we saw today was that of untyped pointer APIs like UnsafeRawPointer, which doesn't offer any information about what's stored in its buffer. We also worked with UnsafeMutablePointer, which is typed. On top of that, a wrapper like UnsafeBufferPointer adds collection conformance by storing a count, which makes it easy to iterate over the pointer's buffer. We could've very well used these pointers and it would've made things much easier.

Debugging

20:42 We usually work in playgrounds for these episodes, but this time we wrote the code in a command line project. Doing so allows us to enable Address Sanitizer (on the Diagnostics tab of the project's scheme settings), which gives us information about, for example, a heap buffer overflow that occurs when we try to write to the wrong address:

21:25 Here, Xcode tells us the address we're trying to write to is located next to a 61-byte region. We also see that the current line length is 60. If we add one for the terminating zero, we can figure out that we're trying to write immediately to the right of the line's buffer. This is very useful information for tracking down mistakes, especially if you're not used to writing this kind of code.

22:13 And that's probably enough of pointers for today!