Designing an Offline-First Sync Engine for Mobile Apps
A deep dive into building a reliable sync engine that keeps mobile apps functional without connectivity - covering conflict resolution, queue management, and real-world trade-offs.

Users don't care about your server. They care that tapping a button does something - right now, on a subway, on a plane, in a parking garage with one bar of signal. If your app goes blank the moment the network drops, you've already lost.
An offline-first sync engine flips the default assumption: the local database is the source of truth, and the server is a peer that eventually catches up. This sounds simple. It is not.
Why Offline-First Matters
Most mobile apps treat the network as a given. They show a spinner, make a request, and render the response. This falls apart in three common scenarios:
- Flaky connections - elevators, tunnels, rural areas, crowded venues where the cell tower is overloaded
- High latency - emerging markets where a round trip can take 2–5 seconds
- Battery optimization - the OS kills background connections aggressively on both Android and iOS
An offline-first architecture removes the network from the critical path. The user interacts with local data. Sync happens in the background when conditions allow.
Core Architecture
The sync engine sits between your app's data layer and the remote API. It has four responsibilities:
- Local persistence - all reads and writes hit a local database
- Change tracking - mutations are captured as a log of operations
- Sync scheduling - a background process pushes and pulls changes when connectivity is available
- Conflict resolution - when the same record is modified locally and remotely, the engine decides what wins
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ App UI │────▶│ Local DB │────▶│ Sync Queue │
│ │◀────│ (SQLite / │ │ (pending │
│ │ │ Realm) │ │ ops) │
└─────────────┘ └──────────────┘ └──────┬──────┘
│
┌──────▼──────┐
│ Sync │
│ Engine │
└──────┬──────┘
│
┌──────▼──────┐
│ Remote API │
└─────────────┘The Operation Log
Every mutation the user makes - create, update, delete - gets written to an append-only operation log before it touches the local database. Each entry includes:
- A unique operation ID
- The entity type and entity ID
- The operation type (create / update / delete)
- A timestamp (logical clock, not wall clock)
- The payload (for creates and updates)
data class SyncOperation(
val id: String = UUID.randomUUID().toString(),
val entityType: String,
val entityId: String,
val type: OperationType,
val timestamp: Long,
val payload: Map<String, Any?>?,
val status: SyncStatus = SyncStatus.PENDING
)
enum class OperationType { CREATE, UPDATE, DELETE }
enum class SyncStatus { PENDING, IN_FLIGHT, SYNCED, FAILED }Using a logical clock (a monotonically increasing counter) instead of wall-clock time avoids issues with users changing their device clock or time zone drift between devices.
Sync Scheduling
You don't want to sync on every mutation. That defeats the purpose and kills battery. Instead, batch operations and sync when conditions are favorable:
- Network available - use
ConnectivityManageron Android orNWPathMonitoron iOS - Debounce - wait 2–5 seconds after the last write before triggering a sync
- Retry with backoff - if sync fails, retry with exponential backoff (1s, 2s, 4s, 8s, capped at 60s)
- Periodic fallback - schedule a WorkManager / BGTaskScheduler job every 15 minutes as a safety net
class SyncScheduler(
private val connectivityMonitor: ConnectivityMonitor,
private val syncEngine: SyncEngine
) {
private var debounceJob: Job? = null
fun onLocalWrite() {
debounceJob?.cancel()
debounceJob = scope.launch {
delay(3_000)
if (connectivityMonitor.isConnected()) {
syncEngine.push()
}
}
}
}Conflict Resolution
This is where offline-first gets hard. Two devices edit the same record while both are offline. When they come back online, you have a conflict. There are three common strategies:
Last-Write-Wins (LWW)
The operation with the highest timestamp wins. Simple to implement, but you silently discard changes. Acceptable for low-stakes data like user preferences or read receipts.
Field-Level Merge
Instead of replacing the entire record, merge at the field level. If device A changes the name and device B changes the email, both changes survive. Conflicts only occur when the same field is modified.
fun mergeFields(
base: Map<String, Any?>,
local: Map<String, Any?>,
remote: Map<String, Any?>
): Map<String, Any?> {
val merged = base.toMutableMap()
for (key in (local.keys + remote.keys)) {
val localChanged = local[key] != base[key]
val remoteChanged = remote[key] != base[key]
merged[key] = when {
localChanged && !remoteChanged -> local[key]
!localChanged && remoteChanged -> remote[key]
localChanged && remoteChanged -> remote[key] // LWW fallback per field
else -> base[key]
}
}
return merged
}Application-Level Resolution
For critical data - financial transactions, collaborative documents, inventory counts - you need domain-specific logic. An inventory system might sum the deltas instead of picking a winner. A collaborative editor might use CRDTs.
Handling Deletes
Deletes are deceptively tricky. If you physically remove a record from the local database and another device hasn't synced yet, the other device will try to re-create it on its next sync.
The solution is tombstones: mark the record as deleted with a deletedAt timestamp instead of removing it. The sync engine propagates the tombstone. All devices converge on the deleted state. Periodically purge tombstones older than 30 days.
data class Entity(
val id: String,
val data: Map<String, Any?>,
val updatedAt: Long,
val deletedAt: Long? = null // null = alive, non-null = tombstone
)Ordering Guarantees
Operations on the same entity must be applied in order. Operations on different entities can be applied in any order. This means your sync queue needs per-entity ordering, not global ordering.
A practical approach:
- Group pending operations by entity ID
- For each entity, sort operations by logical timestamp
- Send them to the server in order, waiting for acknowledgment before sending the next
- Operations on different entities can be sent concurrently
Error Recovery
Things will go wrong. The network will drop mid-sync. The server will return a 500. The app will be killed by the OS. Design for these:
- Idempotent operations - every operation should be safe to retry. Use the operation ID as an idempotency key on the server
- Transactional batches - wrap the local DB update and the queue insertion in a single transaction. Either both happen or neither does
- Status tracking - mark operations as
IN_FLIGHTduring sync so they aren't sent twice. Reset toPENDINGif sync fails - Dead letter queue - after N retries, move permanently failing operations to a dead letter queue for manual inspection
Testing Offline Sync
Unit tests aren't enough. You need integration tests that simulate real network conditions:
- Sync with 100ms latency, 500ms latency, 3s latency
- Sync with intermittent connectivity (connected for 2s, disconnected for 5s, repeat)
- Kill the app mid-sync and verify recovery
- Create conflicting edits on two simulated devices and verify resolution
- Sync a backlog of 1,000+ operations and verify ordering and performance
Android's ConnectivityManager and iOS's NWPathMonitor can be abstracted behind an interface and mocked for deterministic testing.
Trade-Offs to Accept
Offline-first is not free. You're trading simplicity for resilience:
- Storage - you need a local database plus an operation log. On resource-constrained devices this matters
- Complexity - conflict resolution logic is domain-specific and hard to get right
- Consistency - you're embracing eventual consistency. The UI might show stale data for seconds or minutes
- Debugging - when something goes wrong, you're debugging distributed systems on a phone
For many apps, this trade-off is worth it. For some - real-time multiplayer games, live auctions - it isn't. Know which category you're in before committing to the architecture.
Closing Thoughts
The best sync engines are invisible. The user edits data, puts their phone in their pocket, and everything just works. Building that experience requires thinking carefully about operation logs, conflict resolution, and failure modes.
Start simple: local persistence, an operation queue, and last-write-wins. Layer in field-level merging and smarter scheduling as your use cases demand it. The architecture should grow with your product, not ahead of it.