Designing an Offline-First Sync Engine for Mobile Apps

Users don't care about your server. They care that tapping a button does something - right now, on a subway, on a plane, in a parking garage with one bar of signal. If your app goes blank the moment the network drops, you've already lost.

An offline-first sync engine flips the default assumption: the local database is the source of truth, and the server is a peer that eventually catches up. This sounds simple. It is not.

Why Offline-First Matters

Most mobile apps treat the network as a given. They show a spinner, make a request, and render the response. This falls apart in three common scenarios:

Flaky connections - elevators, tunnels, rural areas, crowded venues where the cell tower is overloaded
High latency - emerging markets where a round trip can take 2–5 seconds
Battery optimization - the OS kills background connections aggressively on both Android and iOS

An offline-first architecture removes the network from the critical path. The user interacts with local data. Sync happens in the background when conditions allow.

Core Architecture

The sync engine sits between your app's data layer and the remote API. It has four responsibilities:

Local persistence - all reads and writes hit a local database
Change tracking - mutations are captured as a log of operations
Sync scheduling - a background process pushes and pulls changes when connectivity is available
Conflict resolution - when the same record is modified locally and remotely, the engine decides what wins

 
  ┌─────────────┐     ┌──────────────┐     ┌─────────────┐
  │   App UI    │────▶│  Local DB    │────▶│  Sync Queue │
  │             │◀────│  (SQLite /   │     │  (pending   │
  │             │     │   Realm)     │     │   ops)      │
  └─────────────┘     └──────────────┘     └──────┬──────┘
                                                  │
                                           ┌──────▼──────┐
                                           │   Sync      │
                                           │   Engine    │
                                           └──────┬──────┘
                                                  │
                                           ┌──────▼──────┐
                                           │  Remote API │
                                           └─────────────┘

The Operation Log

Every mutation the user makes - create, update, delete - gets written to an append-only operation log before it touches the local database. Each entry includes:

A unique operation ID
The entity type and entity ID
The operation type (create / update / delete)
A timestamp (logical clock, not wall clock)
The payload (for creates and updates)

data class SyncOperation(
    val id: String = UUID.randomUUID().toString(),
    val entityType: String,
    val entityId: String,
    val type: OperationType,
    val timestamp: Long,
    val payload: Map<String, Any?>?,
    val status: SyncStatus = SyncStatus.PENDING
)
 
enum class OperationType { CREATE, UPDATE, DELETE }
enum class SyncStatus { PENDING, IN_FLIGHT, SYNCED, FAILED }

Using a logical clock (a monotonically increasing counter) instead of wall-clock time avoids issues with users changing their device clock or time zone drift between devices.

Sync Scheduling

You don't want to sync on every mutation. That defeats the purpose and kills battery. Instead, batch operations and sync when conditions are favorable:

Network available - use ConnectivityManager on Android or NWPathMonitor on iOS
Debounce - wait 2–5 seconds after the last write before triggering a sync
Retry with backoff - if sync fails, retry with exponential backoff (1s, 2s, 4s, 8s, capped at 60s)
Periodic fallback - schedule a WorkManager / BGTaskScheduler job every 15 minutes as a safety net

class SyncScheduler(
    private val connectivityMonitor: ConnectivityMonitor,
    private val syncEngine: SyncEngine
) {
    private var debounceJob: Job? = null
 
    fun onLocalWrite() {
        debounceJob?.cancel()
        debounceJob = scope.launch {
            delay(3_000)
            if (connectivityMonitor.isConnected()) {
                syncEngine.push()
            }
        }
    }
}

Conflict Resolution

This is where offline-first gets hard. Two devices edit the same record while both are offline. When they come back online, you have a conflict. There are three common strategies:

Last-Write-Wins (LWW)

The operation with the highest timestamp wins. Simple to implement, but you silently discard changes. Acceptable for low-stakes data like user preferences or read receipts.

Field-Level Merge

Instead of replacing the entire record, merge at the field level. If device A changes the name and device B changes the email, both changes survive. Conflicts only occur when the same field is modified.

fun mergeFields(
    base: Map<String, Any?>,
    local: Map<String, Any?>,
    remote: Map<String, Any?>
): Map<String, Any?> {
    val merged = base.toMutableMap()
    for (key in (local.keys + remote.keys)) {
        val localChanged = local[key] != base[key]
        val remoteChanged = remote[key] != base[key]
        merged[key] = when {
            localChanged && !remoteChanged -> local[key]
            !localChanged && remoteChanged -> remote[key]
            localChanged && remoteChanged -> remote[key] // LWW fallback per field
            else -> base[key]
        }
    }
    return merged
}

Application-Level Resolution

For critical data - financial transactions, collaborative documents, inventory counts - you need domain-specific logic. An inventory system might sum the deltas instead of picking a winner. A collaborative editor might use CRDTs.

Handling Deletes

Deletes are deceptively tricky. If you physically remove a record from the local database and another device hasn't synced yet, the other device will try to re-create it on its next sync.

The solution is tombstones: mark the record as deleted with a deletedAt timestamp instead of removing it. The sync engine propagates the tombstone. All devices converge on the deleted state. Periodically purge tombstones older than 30 days.

data class Entity(
    val id: String,
    val data: Map<String, Any?>,
    val updatedAt: Long,
    val deletedAt: Long? = null  // null = alive, non-null = tombstone
)

Ordering Guarantees

Operations on the same entity must be applied in order. Operations on different entities can be applied in any order. This means your sync queue needs per-entity ordering, not global ordering.

A practical approach:

Group pending operations by entity ID
For each entity, sort operations by logical timestamp
Send them to the server in order, waiting for acknowledgment before sending the next
Operations on different entities can be sent concurrently

Error Recovery

Things will go wrong. The network will drop mid-sync. The server will return a 500. The app will be killed by the OS. Design for these:

Idempotent operations - every operation should be safe to retry. Use the operation ID as an idempotency key on the server
Transactional batches - wrap the local DB update and the queue insertion in a single transaction. Either both happen or neither does
Status tracking - mark operations as IN_FLIGHT during sync so they aren't sent twice. Reset to PENDING if sync fails
Dead letter queue - after N retries, move permanently failing operations to a dead letter queue for manual inspection

Testing Offline Sync

Unit tests aren't enough. You need integration tests that simulate real network conditions:

Sync with 100ms latency, 500ms latency, 3s latency
Sync with intermittent connectivity (connected for 2s, disconnected for 5s, repeat)
Kill the app mid-sync and verify recovery
Create conflicting edits on two simulated devices and verify resolution
Sync a backlog of 1,000+ operations and verify ordering and performance

Android's ConnectivityManager and iOS's NWPathMonitor can be abstracted behind an interface and mocked for deterministic testing.

Trade-Offs to Accept

Offline-first is not free. You're trading simplicity for resilience:

Storage - you need a local database plus an operation log. On resource-constrained devices this matters
Complexity - conflict resolution logic is domain-specific and hard to get right
Consistency - you're embracing eventual consistency. The UI might show stale data for seconds or minutes
Debugging - when something goes wrong, you're debugging distributed systems on a phone

For many apps, this trade-off is worth it. For some - real-time multiplayer games, live auctions - it isn't. Know which category you're in before committing to the architecture.

Closing Thoughts

The best sync engines are invisible. The user edits data, puts their phone in their pocket, and everything just works. Building that experience requires thinking carefully about operation logs, conflict resolution, and failure modes.

Start simple: local persistence, an operation queue, and last-write-wins. Layer in field-level merging and smarter scheduling as your use cases demand it. The architecture should grow with your product, not ahead of it.