00:06 Today, we're starting a new series on Conflict-Free Replicated Data
Types (CRDTs). We stumbled upon this concept while trying to build an app for
food intake, which we wanted to have on both iPhone and Mac, with the ability to
sync between the two. Just thinking about how to make this work almost made us
want to give up — syncing seems very complicated.
00:56 After doing some research, we found out about CRDTs. They've been
around for years, but they aren't used a lot — though Apple's Notes app does use
them internally for syncing. Essentially, a CRDT is a data type that defines a
way to merge any changes from multiple instances, deterministically and without
conflicts.
01:38 In other words, given that we have two (or more) instances of an
app that uses a CRDT as its data model, the state of those instances may
diverge, but they're always able to merge their changes and get back in sync.
This merging is guaranteed to have the same result, regardless of how often or
in which order changes are merged.
02:03 At first, this sounded too good to be true. So let's get started
with a simple CRDT and explore what we can do with it. Later on, we'll see how
simple CRDTs compose into larger CRDTs.
Sample App
02:23 The app we want to eventually build allows runners to enter their
food intake:
After eating a serving of fruit, we click the category, and we get two points.
As we eat more fruit, the number of points for that category decreases for the
selected day. Categories to avoid, such as fried foods, give negative points.
02:56 The app's storage is a dictionary of dictionaries. At the top
level, we map days to a dictionary of entries. Each dictionary of entries stores
the number of servings (as an integer) per food category.
03:26 All of this works offline: we can enter data without needing to be
online. But when we're online, the app automatically syncs to other instances.
We demonstrate this by running both the macOS app and the iOS app (in the
simulator).
03:54 To make the connection between instances, we implemented a Bonjour
service that connects to any peer it finds on the local network — which is
definitely not suitable for production. As soon as we launch both apps, we see
that they connect and sync. When we take one of the apps offline, enter some
food servings on both sides, and then bring them back online, the changes are
synced back and forth. Thus, CRDTs give us offline capability, peer-to-peer
syncing, and real-time collaboration.
05:13 As with most things, it's easy to start out with a simple data
model. And as our data model becomes more complicated, we'll have to invest more
time and attention into its adherence to the CRDT rules — but at least the
theory isn't that hard.
Stepper Example
05:34 Let's start with a simpler example: a stepper control. We'll use
our multi-peer implementation to connect instances of the app and to sync the
stepper's value between them:
06:17 First, we create a Session
state object to establish the
multi-peer connection. Session
is generic over the type of value it sends to
peers. This can be any Codable
type, so we can specify Int
to send the
stepper's value:
struct ContentView: View {
@StateObject var session = Session<Int>()
@State var int = 0
var body: some View {
VStack {
Stepper("\(int)", value: $int)
}
.fixedSize()
.padding(30)
}
}
06:54 The session exposes an AsyncStream
of values, which we can
consume in a task. When we receive a value from this stream, we assign it to the
int
property:
struct ContentView: View {
@StateObject var session = Session<Int>()
@State var int = 0
var body: some View {
VStack {
Stepper("\(int)", value: $int)
}
.fixedSize()
.padding(30)
.task {
for await newValue in session.receiveStream {
int = newValue
}
}
}
}
07:42 We use onChange(of:)
to detect when the value of int
changes
on our end — e.g. by using the stepper — and we send the new value to other
peers via the session:
struct ContentView: View {
@StateObject var session = Session<Int>()
@State var int = 0
var body: some View {
VStack {
Stepper("\(int)", value: $int)
}
.fixedSize()
.padding(30)
.onChange(of: int) { newValue in
try! session.send(newValue)
}
.task {
for await newValue in session.receiveStream {
int = newValue
}
}
}
}
08:22 The stepper's value is now instantly synced between connected
instances of the app. This works as long as we change the value on one side at a
time, but what happens if both apps are being used and they try to send
different values across at the same time? Or what if we go offline and the value
is changed on both sides? We might get unwanted results after reconnecting and
syncing.
09:52 In the current setup, we always overwrite the local value with
whatever we receive. To make the syncing deterministic, we need to think about
how we want values to be merged.
Our First CRDT
10:21 For a data type to be considered a CRDT, it must be able to merge
values in a way that adheres to three principles:
-
Merging A into B should give the same result as merging B into A
(commutativity).
-
Say we have a network issue and we accidentally send a value multiple times.
This should have the same result as merging the value once (idempotence).
-
If we have three instances of our app, it shouldn't matter whether we first
merge A and B and then C, or if we first merge B and C and then A
(associativity).
11:10 One example of a merge strategy for which these three principles
hold is to take the maximum of the values to be merged:
struct ContentView: View {
@StateObject var session = Session<Int>()
@State var int = 0
var body: some View {
VStack {
Stepper("\(int)", value: $int)
}
.fixedSize()
.padding(30)
.onChange(of: int) { newValue in
try! session.send(newValue)
}
.task {
for await newValue in session.receiveStream {
int = max(newValue, int)
}
}
}
}
11:23 For the values 7
and 10
, for example, the result will always
be 10
, no matter how often or in which order we merge. It might not be the
behavior we want, but it's one operation that meets the criteria.
12:15 Let's formalize this behavior in a type:
struct Max: Codable, Equatable {
var value: Int
mutating func merge(_ other: Max) {
value = max(value, other.value)
}
}
12:42 We update the state property int
to be a Max
, and we call the
merge
method when receiving a value from the peer-to-peer session:
struct ContentView: View {
@StateObject var session = Session<Max>()
@State var int = Max(value: 0)
var body: some View {
VStack {
Stepper("\(int.value)", value: $int.value)
}
.fixedSize()
.padding(30)
.onChange(of: int) { newValue in
try! session.send(newValue)
}
.task {
for await newValue in session.receiveStream {
int.merge(newValue)
}
}
}
}
13:33 When we run the iOS and macOS apps, we see that they both
increment their values when we use the plus button of one of the steppers. But
when we use the minus button, the local value decrements, while the other app
keeps the maximum value. For the time being, our apps can get out of sync, but
that's because the UI no longer is a good match for the merge strategy we chose.
14:23 Before continuing, let's clean up our code a bit. We can make
Max
generic over the type of value it holds, as long as this type is
Comparable
. Max
can conditionally conform to Codable
if its Value
type
is Codable
:
struct Max<Value: Comparable>: Equatable {
var value: Value
mutating func merge(_ other: Self) {
value = max(value, other.value)
}
}
extension Max: Codable where Value: Codable { }
Testing
15:35 Next, we want to write some tests to verify that our data type
correctly implements the three CRDT principles. Although we can easily see that
Max
is implemented correctly, the structure for these tests will be the same
for more complex data types, so this is a good time to write down what we expect
from CRDTs.
16:35 Given two Max
values, a
and b
, we want to assert that the
merge operation is commutative, i.e. the result of merging b
into a
is equal
to the result of merging a
into b
:
import XCTest
@testable import BonjourSample
class BonjourSampleTests: XCTestCase {
func testMax() {
let a = Max(value: 1)
let b = Max(value: 2)
XCTAssertEqual(a.merged(b), b.merged(a))
}
}
17:56 To make testing easier, we write a non-mutating merged
method,
which creates a copy of the Max
value we call the method on:
struct Max<Value: Comparable>: Equatable {
var value: Value
mutating func merge(_ other: Self) {
value = max(value, other.value)
}
func merged(_ other: Self) -> Self {
var copy = self
copy.merge(other)
return copy
}
}
18:21 The above assertion should be true for any combination of
integers. So instead of using two hardcoded values, we can run the test a
thousand times with random integers:
fileprivate let testCycles = 1000
extension Int {
static func random() -> Int {
Int.random(in: Int.min..<Int.max)
}
}
class BonjourSampleTests: XCTestCase {
func testMax() {
for _ in 0..<testCycles {
let a = Max(value: Int.random())
let b = Max(value: Int.random())
XCTAssertEqual(a.merged(b), b.merged(a))
}
}
}
20:54 Secondly, we want to test that the merge operation is idempotent:
merging a value with itself results in the same value. This is trivial for
Max
, but not for other operations, such as adding up values or appending an
element to an array. So we want to make sure that merging a
into a
is equal
to a
:
XCTAssertEqual(a.merged(a), a)
22:17 The third property we want to test is associativity: if we add a
third instance, merging a
and b
and then c
should have the same result as
merging b
and c
and then a
:
class BonjourSampleTests: XCTestCase {
func testMax() {
for _ in 0..<testCycles {
let a = Max(value: Int.random())
let b = Max(value: Int.random())
let c = Max(value: Int.random())
XCTAssertEqual(a.merged(b), b.merged(a))
XCTAssertEqual(a.merged(a), a)
XCTAssertEqual((a.merged(b)).merged(c), a.merged(b.merged(c)))
}
}
}
23:37 Again, this makes a lot of sense for Max
, but that's not the
case for any data type or any kind of syncing logic. Next time, we'll look at
more data types that have these properties, and we'll work toward getting our
food-tracking app to sync.
References