CRDTs – Introduction - Swift Talk

CRDTs – Introduction

In this episode24:20

This episode is freely available thanks to the support of our subscribers

Subscribers get exclusive access to new and all previous subscriber-only episodes, video downloads, and 30% discount for team members. Become a Subscriber →

We begin to investigate Conflict-Free Replicated Data Types as a way to sync data across devices.

00:06 Today, we're starting a new series on Conflict-Free Replicated Data Types (CRDTs). We stumbled upon this concept while trying to build an app for food intake, which we wanted to have on both iPhone and Mac, with the ability to sync between the two. Just thinking about how to make this work almost made us want to give up — syncing seems very complicated.

00:56 After doing some research, we found out about CRDTs. They've been around for years, but they aren't used a lot — though Apple's Notes app does use them internally for syncing. Essentially, a CRDT is a data type that defines a way to merge any changes from multiple instances, deterministically and without conflicts.

01:38 In other words, given that we have two (or more) instances of an app that uses a CRDT as its data model, the state of those instances may diverge, but they're always able to merge their changes and get back in sync. This merging is guaranteed to have the same result, regardless of how often or in which order changes are merged.

02:03 At first, this sounded too good to be true. So let's get started with a simple CRDT and explore what we can do with it. Later on, we'll see how simple CRDTs compose into larger CRDTs.

Sample App

02:23 The app we want to eventually build allows runners to enter their food intake:

After eating a serving of fruit, we click the category, and we get two points. As we eat more fruit, the number of points for that category decreases for the selected day. Categories to avoid, such as fried foods, give negative points.

02:56 The app's storage is a dictionary of dictionaries. At the top level, we map days to a dictionary of entries. Each dictionary of entries stores the number of servings (as an integer) per food category.

03:26 All of this works offline: we can enter data without needing to be online. But when we're online, the app automatically syncs to other instances. We demonstrate this by running both the macOS app and the iOS app (in the simulator).

03:54 To make the connection between instances, we implemented a Bonjour service that connects to any peer it finds on the local network — which is definitely not suitable for production. As soon as we launch both apps, we see that they connect and sync. When we take one of the apps offline, enter some food servings on both sides, and then bring them back online, the changes are synced back and forth. Thus, CRDTs give us offline capability, peer-to-peer syncing, and real-time collaboration.

05:13 As with most things, it's easy to start out with a simple data model. And as our data model becomes more complicated, we'll have to invest more time and attention into its adherence to the CRDT rules — but at least the theory isn't that hard.

Stepper Example

05:34 Let's start with a simpler example: a stepper control. We'll use our multi-peer implementation to connect instances of the app and to sync the stepper's value between them:

06:17 First, we create a Session state object to establish the multi-peer connection. Session is generic over the type of value it sends to peers. This can be any Codable type, so we can specify Int to send the stepper's value:

struct ContentView: View {
    @StateObject var session = Session<Int>()
    @State var int = 0
    
    var body: some View {
        VStack {
            Stepper("\(int)", value: $int)
        }
        .fixedSize()
        .padding(30)
    }
}

06:54 The session exposes an AsyncStream of values, which we can consume in a task. When we receive a value from this stream, we assign it to the int property:

struct ContentView: View {
    @StateObject var session = Session<Int>()
    @State var int = 0
    
    var body: some View {
        VStack {
            Stepper("\(int)", value: $int)
        }
        .fixedSize()
        .padding(30)
        .task {
            for await newValue in session.receiveStream {
                int = newValue
            }
        }
    }
}

07:42 We use onChange(of:) to detect when the value of int changes on our end — e.g. by using the stepper — and we send the new value to other peers via the session:

struct ContentView: View {
    @StateObject var session = Session<Int>()
    @State var int = 0
    
    var body: some View {
        VStack {
            Stepper("\(int)", value: $int)
        }
        .fixedSize()
        .padding(30)
        .onChange(of: int) { newValue in
            try! session.send(newValue)
        }
        .task {
            for await newValue in session.receiveStream {
                int = newValue
            }
        }
    }
}

08:22 The stepper's value is now instantly synced between connected instances of the app. This works as long as we change the value on one side at a time, but what happens if both apps are being used and they try to send different values across at the same time? Or what if we go offline and the value is changed on both sides? We might get unwanted results after reconnecting and syncing.

09:52 In the current setup, we always overwrite the local value with whatever we receive. To make the syncing deterministic, we need to think about how we want values to be merged.

Our First CRDT

10:21 For a data type to be considered a CRDT, it must be able to merge values in a way that adheres to three principles:

Merging A into B should give the same result as merging B into A (commutativity).
Say we have a network issue and we accidentally send a value multiple times. This should have the same result as merging the value once (idempotence).
If we have three instances of our app, it shouldn't matter whether we first merge A and B and then C, or if we first merge B and C and then A (associativity).

11:10 One example of a merge strategy for which these three principles hold is to take the maximum of the values to be merged:

struct ContentView: View {
    @StateObject var session = Session<Int>()
    @State var int = 0
    
    var body: some View {
        VStack {
            Stepper("\(int)", value: $int)
        }
        .fixedSize()
        .padding(30)
        .onChange(of: int) { newValue in
            try! session.send(newValue)
        }
        .task {
            for await newValue in session.receiveStream {
                int = max(newValue, int)
            }
        }
    }
}

11:23 For the values 7 and 10, for example, the result will always be 10, no matter how often or in which order we merge. It might not be the behavior we want, but it's one operation that meets the criteria.

12:15 Let's formalize this behavior in a type:

struct Max: Codable, Equatable {
    var value: Int
    
    mutating func merge(_ other: Max) {
        value = max(value, other.value)
    }
}

12:42 We update the state property int to be a Max, and we call the merge method when receiving a value from the peer-to-peer session:

struct ContentView: View {
    @StateObject var session = Session<Max>()
    @State var int = Max(value: 0)
    
    var body: some View {
        VStack {
            Stepper("\(int.value)", value: $int.value)
        }
        .fixedSize()
        .padding(30)
        .onChange(of: int) { newValue in
            try! session.send(newValue)
        }
        .task {
            for await newValue in session.receiveStream {
                int.merge(newValue)
            }
        }
    }
}

13:33 When we run the iOS and macOS apps, we see that they both increment their values when we use the plus button of one of the steppers. But when we use the minus button, the local value decrements, while the other app keeps the maximum value. For the time being, our apps can get out of sync, but that's because the UI no longer is a good match for the merge strategy we chose.

14:23 Before continuing, let's clean up our code a bit. We can make Max generic over the type of value it holds, as long as this type is Comparable. Max can conditionally conform to Codable if its Value type is Codable:

struct Max<Value: Comparable>: Equatable {
    var value: Value
    
    mutating func merge(_ other: Self) {
        value = max(value, other.value)
    }
}

extension Max: Codable where Value: Codable { }

Testing

15:35 Next, we want to write some tests to verify that our data type correctly implements the three CRDT principles. Although we can easily see that Max is implemented correctly, the structure for these tests will be the same for more complex data types, so this is a good time to write down what we expect from CRDTs.

16:35 Given two Max values, a and b, we want to assert that the merge operation is commutative, i.e. the result of merging b into a is equal to the result of merging a into b:

import XCTest
@testable import BonjourSample

class BonjourSampleTests: XCTestCase {
    func testMax() {
        let a = Max(value: 1)
        let b = Max(value: 2)
        XCTAssertEqual(a.merged(b), b.merged(a))
    }
}

17:56 To make testing easier, we write a non-mutating merged method, which creates a copy of the Max value we call the method on:

struct Max<Value: Comparable>: Equatable {
    var value: Value
    
    mutating func merge(_ other: Self) {
        value = max(value, other.value)
    }
    
    func merged(_ other: Self) -> Self {
        var copy = self
        copy.merge(other)
        return copy
    }
}

18:21 The above assertion should be true for any combination of integers. So instead of using two hardcoded values, we can run the test a thousand times with random integers:

fileprivate let testCycles = 1000

extension Int {
    static func random() -> Int {
        Int.random(in: Int.min..<Int.max)
    }
}

class BonjourSampleTests: XCTestCase {
    func testMax() {
        for _ in 0..<testCycles {
            let a = Max(value: Int.random())
            let b = Max(value: Int.random())
            XCTAssertEqual(a.merged(b), b.merged(a))
        }
    }
}

20:54 Secondly, we want to test that the merge operation is idempotent: merging a value with itself results in the same value. This is trivial for Max, but not for other operations, such as adding up values or appending an element to an array. So we want to make sure that merging a into a is equal to a:

XCTAssertEqual(a.merged(a), a)

22:17 The third property we want to test is associativity: if we add a third instance, merging a and b and then c should have the same result as merging b and c and then a:

class BonjourSampleTests: XCTestCase {
    func testMax() {
        for _ in 0..<testCycles {
            let a = Max(value: Int.random())
            let b = Max(value: Int.random())
            let c = Max(value: Int.random())
            XCTAssertEqual(a.merged(b), b.merged(a))
            XCTAssertEqual(a.merged(a), a)
            XCTAssertEqual((a.merged(b)).merged(c), a.merged(b.merged(c)))
        }
    }
}

23:37 Again, this makes a lot of sense for Max, but that's not the case for any data type or any kind of syncing logic. Next time, we'll look at more data types that have these properties, and we'll work toward getting our food-tracking app to sync.