Skip to main content

Offline Sync Protocol

The mobile app must work fully offline — a tech on a roof with no signal needs to read jobs, edit them, take photos, and walk away with the confidence that everything will land back in the cloud the moment they're back in range. Last-write-wins doesn't cut it: if a tech edits a job at 10am and the dispatcher edits the same job at 10:15, both edits matter. We modeled the sync layer after Git, because Git solved this exact problem in 2005.

The mental model

Each side — the mobile device and the per-tenant D1 in Cloudflare — keeps an append-only changes table. Every modification to any record is one row. The table never deletes; it only grows.

A sync is one round-trip: the mobile sends "here's my head (the ID of my latest change) and here's everything I have that you might not." The server replies "here's everything I have that you don't." Both sides apply the missing entries to their respective changes tables, then re-materialize the record state from the log.

This is exactly how git pull works.

Mobile · local SQLite
c_01H...K · set full_name = "Sarah G"
c_01H...L · set status = "on_site"
c_01H...M · add note "called back"
c_01H...N · set fee_paid = 340.00
head: c_01H...N · last_acked: c_01H...J
POST /sync →
← response
Tenant D1 · changes
c_01H...J · set phone = "208-555..." (same)
c_01H...P · dispatcher edit · status = "scheduled"
c_01H...Q · auto: estimate_sent_at = now
c_01H...R · admin: assigned_tech = "Jake T"
head: c_01H...R

The exchange: the mobile sends entries K, L, M, N (its unpushed changes) plus its last acknowledged server head (J). The server appends K-N to its log, replies with P, Q, R (everything after the acked head J). After this round-trip both sides hold the same log and re-derive identical record state.


The changes table

The append-only log is the single source of truth. Record state is a derived view over it.

ColumnTypePurpose
idTEXT PKUUIDv7 — globally unique, lexicographically sortable by creation time
record_idTEXT FKWhich record this change applies to
record_typeTEXTDenormalized for fast filtering during partial sync
opTEXTset_field · delete · create · undelete · reorder
field_idTEXT FK NULLFor set_field — which field changed
value_jsonTEXT NULLThe new value, JSON-encoded so any data type fits
actor_user_idTEXT FKWho made the change — for audit and conflict UI
actor_device_idTEXTWhich device originated the change — useful for debugging sync
client_tsINTEGERTime the change happened on the originating device
server_tsINTEGER NULLTime the server first wrote the change to its log — NULL until synced
parent_idTEXT FK NULLFor ops that depend on a prior change — e.g. an undo references the change it reverses

The sync protocol

A single endpoint handles every sync. The request is the mobile's "here's what I have," the response is the server's "here's what you missed." Both sides apply the result to their local change log, then run the materializer to update the record snapshot tables.

Request · POST /sync

last_server_head — server-issued change ID the mobile has applied up to
device_id — stable per-install identifier
changes — array of unpushed entries with full payloads
page_size — max entries to receive in the response
schema_version — protects against client/server skew

Response · 200 OK

new_changes — entries the mobile hasn't seen yet (paginated)
new_server_head — the latest change ID after this sync
has_more — boolean — true if pagination cut off the list
conflicts — array of changes the server rejected for human review
schema_pull — list of new field_definitions the client should load

Conflict resolution · the three tiers

Most edits don't actually conflict. Two techs editing different fields of the same record never collide; that's the common case and it's free. Of the edits that do touch the same field, we use a three-tier strategy.

TierField typesResolution
Mergeabletext notes · tags · ordered lists · countersCRDT merge — both edits survive. Two techs adding tags? Both tags applied. Two techs appending to notes? Both lines appear in timestamp order.
Last-Writer-Winsstatus enums · phone · email · numbers without historyThe change with the later client_ts wins. We log the loser as overridden_by so the audit trail shows what was discarded and by whom — never silently.
Manualfinancial fields · contract amounts · invoice totals · signed documentsThe server refuses to auto-merge. Both versions are surfaced in a Conflicts inbox on the web UI. A human Owner or Admin must pick one. The losing version is retained in audit.

The merge tier is set on the field definition. Each field declares its merge strategy when it's created. Industry-default field bundles ship with sensible tiers (status = LWW, notes = mergeable, invoice_total = manual). Tenants can override for custom fields.


Walkthrough · two edits, one record

A roofing job has a status field. The dispatcher in the office changes it to "scheduled" at 10:00. The tech on the roof (offline) marks it "on_site" at 10:15. They both come back online at 11:00.

10:00
Dispatcher
writes change_id=P · status=scheduled · server_ts=10:00
10:15
Tech (offline)
writes change_id=L · status=on_site · client_ts=10:15
11:00
Tech reconnects
posts /sync with last_server_head=J + [L]. Server receives L, sees field=status is LWW, compares L (client_ts=10:15) > P (server_ts=10:00). L wins.
11:00+1s
Both sides settle
server head = R. Record status = on_site. An overridden_by entry references P so the audit shows dispatcher's edit was superseded.

If the same scenario had played out on a field_definition with merge tier "Manual" — say final_invoice_amount — the server would have written L but flagged the conflict. Both values would be preserved, the field would visibly show "conflict pending review" on every interface, and an Owner would have to choose. No one's edit disappears silently.


Idempotency, replay safety, and partial sync

A protocol that ships any data over a flaky cellular network has to assume every request will be retried at least once, possibly with partial success. A few invariants keep this safe.

Idempotent writesEach change carries its UUIDv7 id, generated client-side. If the server already has that id, the duplicate is dropped silently. A retry produces no extra rows.
Chunked uploadsLarge change batches are uploaded in fixed-size chunks with explicit acks. The mobile remembers the last chunk acked; if the connection dies mid-upload, it resumes from there.
Pagination on downloadIf a device has been offline for two weeks, the server might owe it thousands of changes. The response is paginated; the mobile pulls until has_more=false. Each page advances last_server_head so a connection drop mid-pull resumes cleanly.
Schema sync before dataIf the server's schema_version is newer than the client's, the mobile first downloads the new field_definitions, applies them locally, then resumes the data sync. A field never references a definition the device doesn't know.
Snapshot bootstrap for new devicesA brand-new device install doesn't replay the full history (which could be years of changes). It downloads a server-side snapshot of current record state, then begins normal sync from a fresh server_head.
Audit is immutableEven when changes are superseded by LWW or rejected as conflicts, they remain in the changes table forever. The materialized state can be rebuilt from any point in history — useful for forensics, undo, and regulatory audit.

Real-time updates · the other half of "sync"

The /sync endpoint is the floor — it works when there's no signal, when there's a delay, when the device has been off for a week. When all parties do have signal, we layer real-time updates on top so the dispatcher's web UI updates the moment a tech checks in.

A Durable Object per tenant maintains a list of subscribed websockets. When the Sync Worker writes a batch of changes to a tenant D1, it also publishes those change IDs through the Durable Object, which fans them out to all active sessions. Each client receives a tiny "you have new changes" hint and runs a normal /sync to pull them. The protocol is the same; the difference is that we don't wait for the next interval.

This is the same trick Linear and Notion use — pub/sub layered over an eventually-consistent log — and it's what lets the system feel instantaneous when conditions allow without being fragile when conditions don't.

Continue to Teams & RBAC for how access is scoped within a tenant.