00:06 Today we'll have some fun with Swift's pointer APIs. We generally
only have to deal with pointers when working with C libraries, though in some
special cases, we might want to use them for the raw performance. As an exercise
to see what working with pointers looks like, we're going to read data from a
text file and split the text into lines, all without using Swift arrays or
strings.
Swift Version
00:49 But first, we'll implement the assignment by writing it the way we
normally would in Swift, which is done with two lines of code: we decode a
string from the ASCII file, and we split the string by line endings. Printing
the result, we see an array of strings — one string for each line in the file:
let url = FileManager.default.homeDirectoryForCurrentUser.appendingPathComponent("Downloads/test.txt")
let str = try! String(contentsOf: url, encoding: .ascii)
let lines = str.split(separator: "\n")
print(lines)
Opening and Reading the File
01:48 That was super easy in Swift. Now we'll do the same thing using
only pointers and C functions.
The first step is to open the file for reading:
let path = FileManager.default.homeDirectoryForCurrentUser.appendingPathComponent("Downloads/test.txt").path
guard let file = fopen(path, "r") else { fatalError() }
02:50 We want to read to the end of the file, but we don't yet know the
length of the file's contents. We use fseek
to place the current position at
the end of the file, and we store this position as the size of the file, which
is the number of bytes. Afterward, we rewind
to set the current position back
to the beginning of the file:
fseek(file, 0, SEEK_END)
let size = ftell(file)
rewind(file)
03:50 To read the file, we call the function fread
, which writes the
file contents to an UnsafeMutableRawPointer
that we pass in. We create the
pointer by allocating the number of bytes we need, and we have to specify how
the buffer is aligned in memory. The alignment doesn't matter in our case, so we
specify 1
, meaning the buffer can be placed anywhere. We also have to pass in
the size of items to read (one byte in our case), the number of items (the
number of bytes, i.e. size
), and the file to read from:
let ptr = UnsafeMutableRawPointer.allocate(bytes: size, alignedTo: 1)
fread(ptr, 1, size, file)
04:58 We check that fread
returns the correct number of items, and we
close the file:
guard fread(ptr, 1, size, file) == size else { fatalError() }
fclose(file)
05:23 Anytime we open a file, we have to be careful to close it after
we're done with it. If we put the file closing in a defer statement, we can move
the entire thing to where we open the file, keeping the opening and closing of
the file together in code:
guard let file = fopen(path, "r") else { fatalError() }
defer { fclose(file) }
05:54 Another thing we should pay attention to: whenever we allocate
memory, we have to remember to free it, so we add a comment to remind ourselves
to deallocate the pointer later:
let ptr = UnsafeMutableRawPointer.allocate(bytes: size, alignedTo: 1)
Splitting into Lines
06:02 Moving on, we want to do something with the
UnsafeMutableRawPointer
. Currently, there is no information about what's
stored in this buffer. By rebinding the memory, we can turn the raw pointer into
a typed pointer, UnsafeMutablePointer<CChar>
:
let chars = ptr.bindMemory(to: CChar.self, capacity: size)
07:09 We want to end up with something like an array of strings. In C
terms, that means a pointer to pointers of CChar
. To construct this outer
pointer, we need to find out how big it should be — how many lines there are —
which we do in two steps. First we iterate over the entire string and count the
number of lines, and then we allocate a pointer for each line.
07:40 Compare this procedure to working with a Swift array, where we'd
normally create an empty array and then append lines to it. The difference is
that now we don't have the ability to append, so we have to create an array with
a certain size upfront and subsequently assign values to the array's elements.
This is basically what we're doing, but with pointers instead of arrays and
strings.
08:03 To iterate over chars
, we have to cheat a little bit and use a
Swift range, because we no longer have C-style for-loops in Swift:
for idx in 0..<size {
}
08:24 To count the lines, we check each character to see if it's a line
feed. We can't directly compare a CChar
to "\n"
, so we convert "\n"
into a
C string, take its first character, and store it as a constant:
let lineFeed = "\n".utf8CString.first!
var lineCount = 0
for idx in 0..<size {
if chars[idx] == lineFeed {
lineCount += 1
}
}
09:35 We could've written the same thing with filter
and count
, but
those aren't allowed in today's challenge.
09:54 Again, we would create an array of strings in Swift. But in C, a
string is a pointer of characters, so we want to create a pointer of pointers of
characters. In other words, we need to allocate a buffer of memory that will
contain 12 pointers, and each of these pointers should point to the buffer
containing the actual characters of a line:
let lines = UnsafeMutablePointer<UnsafeMutablePointer<CChar>>.allocate(capacity: lineCount)
11:23 With this buffer in place, we want to loop over all the characters
from our file and pull out the lines. We look for line feeds again, this time
calculating the position and length of the current line:
var lineOffset = 0
for idx in 0..<size {
guard chars[idx] == lineFeed else { continue }
let lineLength = idx - lineOffset
lineOffset = idx + 1
}
13:14 Having found the start position and the length of a line, we want
to copy everything in this range into a new buffer that represents the line. C
strings are null-terminated, so we increase the capacity of the buffer by one
byte, and we set the last byte to 0
through subscript:
var lineOffset = 0
for idx in 0..<size {
guard chars[idx] == lineFeed else { continue }
let lineLength = idx - lineOffset
let line = UnsafeMutablePointer<CChar>.allocate(capacity: lineLength + 1)
line.initialize(from: chars.advanced(by: lineOffset), count: lineLength)
line[lineLength] = 0
lineOffset = idx + 1
}
15:06 We assign the newly created line
pointer to the lines
buffer,
for which we need to keep track of a line index:
var lineOffset = 0
var lineIdx = 0
for idx in 0..<size {
guard chars[idx] == lineFeed else { continue }
let lineLength = idx - lineOffset
let line = UnsafeMutablePointer<CChar>.allocate(capacity: lineLength + 1)
line.initialize(from: chars.advanced(by: lineOffset), count: lineLength)
line[lineLength] = 0
lines[lineIdx] = line
lineIdx += 1
lineOffset = idx + 1
}
15:52 We add two more reminders: one to deallocate the outer lines
pointer, and one for the inner line
pointers. But we remove the reminder we
wrote before; we can deallocate the chars
pointer now that we're done with it:
chars.deallocate(capacity: size)
Printing Lines
16:47 To see if everything worked, we want to loop over the lines and
print them. Since printf
isn't available in Swift, we have to cheat and
convert the CChar
pointers to Swift strings in order to print them:
for lineIdx in 0..<lineCount {
let line = lines[lineIdx]
let str = String(cString: line)
print(str)
}
18:19 Thinking about memory management, we know that we no longer need
the line pointers after printing the lines. So in the same loop, we can
deallocate each line pointer. We have to pass in the capacity again, which is
the length of the line, plus one for the terminating zero:
for lineIdx in 0..<lineCount {
let line = lines[lineIdx]
let str = String(cString: line)
print(str)
line.deallocate(capacity: strlen(line) + 1)
}
19:08 Finally, we deallocate the lines
pointer:
lines.deallocate(capacity: lineCount)
Discussion
19:32 That was way more work than the two lines of the Swift example,
but it's cool we can do this low-level stuff. And we could go even lower if we
wanted with malloc
.
19:57 Swift gives us many different levels of abstraction, and it's up
to us as developers to decide what to use. The lowest level we saw today was
that of untyped pointer APIs like UnsafeRawPointer
, which doesn't offer any
information about what's stored in its buffer. We also worked with
UnsafeMutablePointer
, which is typed. On top of that, a wrapper like
UnsafeBufferPointer
adds collection conformance by storing a count, which
makes it easy to iterate over the pointer's buffer. We could've very well used
these pointers and it would've made things much easier.
Debugging
20:42 We usually work in playgrounds for these episodes, but this time
we wrote the code in a command line project. Doing so allows us to enable
Address Sanitizer (on the Diagnostics tab of the project's scheme settings),
which gives us information about, for example, a heap buffer overflow that
occurs when we try to write to the wrong address:
21:25 Here, Xcode tells us the address we're trying to write to is
located next to a 61-byte region. We also see that the current line length
is 60. If we add one for the terminating zero, we can figure out that we're
trying to write immediately to the right of the line's buffer. This is very
useful information for tracking down mistakes, especially if you're not used to
writing this kind of code.
22:13 And that's probably enough of pointers for today!