Architecture
Sync
Spacedrive synchronizes libraries by treating SQLite cells as last-write-wins CRDTs, with sync metadata about each model being encoded into Prisma schema using model attributes.
In the cases where LWW CRDTs are used, conflicts are resolved using a Hybrid Logical Clock to determine the ordering of events.
We would be remiss to not credit credit Actual Budget with many of the CRDT concepts used in Spacedrive's sync system.
Model Types
All data in a library conforms to one of the following types. Each type uses a different strategy for syncing.
Local
Local records exist entirely outside of the sync system. They don't have Sync IDs and never leave the node they were created on.
Used for Nodes, Statistics, and Sync Events.
@local
Shared
Shared records encompass most data synced in the CRDT fashion. Updates are applied per-field using a last-write-wins strategy.
Used for Objects, Tags, Spaces, and Jobs.
@shared(id?: String, modelId: Int)
id
- Scalar field to override the default Sync ID.modelId
- Integer to identify the model by. Helps save on bandwidth and storage as opposed to storing model names as strings.
Relation
Similar to shared records, but represent a many-to-many relation between two records.
Sync ID is the combination of item
and group
Sync IDs.
Used for TagOnFile and FileInSpace.
@relation(item: String, group: String, modelId: Int)
item
- Field that identifies the item that the relation is connecting.group
- Field that identifies the group that the item should be connected to.modelId
- Integer to identify the model by. Helps save on bandwidth and storage as opposed to storing model names as strings.
Sync Actors
The sync system is comprised of a number of actors that send, receive, and ingest sync operations.
Ingest
The most important of these is the ingest actor. Its existence entirely independent of both the core and the cloud sync actors allows it to be interacted with by many different systems, while ensuring that only one process is responsible for providing it with operations to ingest at a time.
Its work loop goes something like this:
- Wait for a notification that new sync operations are available to ingest
- Request new operations from whatever process holds the communication channel with the ingester.
- This request is accompanied by timestamps of the last operation ingested for each instance of the library, to be used to fetch only the latest necessary sync operations.
- If the communication channel is dropped, the work loop will restart
- Ingested the sync operations in batches distinguished by instance, model, and record id
- If the batch contains a
Delete
, all other operations are ignored and the record is deleted. - If the batch contains a
Create
, allUpdate
operations are applied on top of it to reduce database load. - If the batch only contains
Update
operations, they are applied on top of a fakeCreate
operation to reduce datbase load. - The latest timestamp of the instance is updated to the timestamp of the last operation ingested.
- If the batch contains a
- Return to the beginning, waiting for a notification of new operations
Cloud Send/Receive/Ingest
Each of these play a different role in getting sync operations to and from the cloud, and into the ingest actor.
Send
When new sync operations are created on an instance will attempt to obtain a lock on cloud sync operations for that particular instance, and will upload them as a compressed base64 format.
The lock on cloud sync operations is necessary to prevent multiple processes from attempting to upload operations at the same time, and is implemented as a short-expiry redis entry.
Receive
Downloads sync operations from the cloud periodically, and stores them in the cloud_crdt_operation
table.
Ingest
Reads sync operations from the cloud_crdt_operation
table, and sends them to the ingest actor.