Handling Data Conflicts in Offline-First Systems
Strategies for detecting and resolving data conflicts in offline-first mobile systems, covering CRDTs, last-write-wins, operational transforms, and manual resolution.
Offline-first systems let users modify data without a network connection. The moment two devices modify the same record independently, you have a conflict. The resolution strategy you choose defines your data model, your UX, and your operational complexity. This post covers the options and their trade-offs.
Context
Offline-first mobile apps (note-taking, field service, collaborative editing) allow writes to a local database that sync to a server when connectivity returns. When multiple devices or users modify the same entity while offline, the sync process must detect conflicts and resolve them without data loss.
Problem
Design a conflict resolution system that:
- Detects conflicting modifications to the same entity
- Resolves conflicts automatically where possible
- Preserves user intent when automatic resolution is insufficient
- Maintains eventual consistency across all devices
Related: How I'd Design a Mobile Configuration System at Scale.
See also: Event Tracking System Design for Android Applications.
Constraints
| Constraint | Detail |
|---|---|
| Latency | Conflict detection must not add perceptible delay to sync |
| Data integrity | No silent data loss; users must be informed when conflicts affect their work |
| Complexity | Resolution logic must be understandable and debuggable |
| Storage | Conflict metadata adds overhead; must be bounded |
| UX | Manual resolution prompts must be rare (less than 1% of syncs) |
Design
Conflict Detection
A conflict exists when two versions of the same entity diverge from the same base version.
Vector clocks track causality:
data class VectorClock(
val entries: Map<String, Long> = emptyMap() // deviceId -> counter
) {
fun increment(deviceId: String): VectorClock {
val current = entries.getOrDefault(deviceId, 0)
return VectorClock(entries + (deviceId to current + 1))
}
fun merge(other: VectorClock): VectorClock {
val merged = (entries.keys + other.entries.keys).associateWith { key ->
maxOf(
entries.getOrDefault(key, 0),
other.entries.getOrDefault(key, 0)
)
}
return VectorClock(merged)
}
fun conflictsWith(other: VectorClock): Boolean {
val thisAhead = entries.any { (k, v) -> v > other.entries.getOrDefault(k, 0) }
val otherAhead = other.entries.any { (k, v) -> v > entries.getOrDefault(k, 0) }
return thisAhead && otherAhead // Neither dominates
}
}When conflictsWith returns true, neither version is a strict successor of the other. A resolution strategy must be applied.
Resolution Strategies
1. Last-Write-Wins (LWW)
The version with the latest timestamp wins. Simple, but lossy.
| Aspect | Detail |
|---|---|
| Implementation | Compare modified_at timestamps; keep the latest |
| Clock skew risk | Devices with wrong clocks can overwrite newer data |
| Data loss | The losing write is silently discarded |
| Best for | Low-value data, settings, preferences |
2. Field-Level Merge
Instead of replacing the entire entity, merge at the field level. Non-conflicting fields are merged automatically; only fields modified by both sides require resolution.
fun fieldLevelMerge(
base: Map<String, Any>,
local: Map<String, Any>,
remote: Map<String, Any>
): MergeResult {
val merged = mutableMapOf<String, Any>()
val conflicts = mutableListOf<FieldConflict>()
val allKeys = base.keys + local.keys + remote.keys
for (key in allKeys) {
val baseVal = base[key]
val localVal = local[key]
val remoteVal = remote[key]
when {
localVal == remoteVal -> merged[key] = localVal ?: continue
localVal == baseVal -> merged[key] = remoteVal ?: continue // Only remote changed
remoteVal == baseVal -> merged[key] = localVal ?: continue // Only local changed
else -> conflicts.add(FieldConflict(key, localVal, remoteVal))
}
}
return MergeResult(merged, conflicts)
}3. CRDTs (Conflict-Free Replicated Data Types)
Data structures that mathematically guarantee convergence without coordination.
Common CRDTs for mobile:
- G-Counter: Increment-only counter. Each device maintains its own count; the total is the sum.
- LWW-Register: Single value with a timestamp. Last write wins, but formalized.
- OR-Set (Observed-Remove Set): Add and remove elements without conflicts. Each element tagged with a unique add ID.
class GCounter(private val counts: MutableMap<String, Long> = mutableMapOf()) {
fun increment(deviceId: String) {
counts[deviceId] = (counts[deviceId] ?: 0) + 1
}
fun value(): Long = counts.values.sum()
fun merge(other: GCounter): GCounter {
val merged = mutableMapOf<String, Long>()
for (key in counts.keys + other.counts.keys) {
merged[key] = maxOf(counts[key] ?: 0, other.counts[key] ?: 0)
}
return GCounter(merged)
}
}4. Manual Resolution
When automatic strategies are insufficient, present both versions to the user.
Best practices:
- Show a diff highlighting the conflicting fields
- Provide "keep mine", "keep theirs", and "merge manually" options
- Set a timeout: if the user does not resolve within 7 days, apply LWW as a fallback
Strategy Selection Matrix
| Data Type | Strategy | Rationale |
|---|---|---|
| User settings | LWW | Low conflict cost, latest preference is usually correct |
| Document text | Operational Transform or CRDT | Preserves concurrent edits |
| Inventory counts | G-Counter CRDT | Mathematically correct merge |
| Form submissions | Field-level merge | Most fields independent |
| Financial records | Manual resolution | Cannot afford silent data loss |
Trade-offs
| Decision | Upside | Downside |
|---|---|---|
| LWW | Simplest implementation, no user intervention | Silent data loss |
| Field-level merge | Reduces manual conflicts by 80-90% | Complex merge logic, must track base version |
| CRDTs | Guaranteed convergence, no coordination | Limited data structure support, higher storage |
| Manual resolution | No data loss | Disrupts user workflow, conflict backlog risk |
| Vector clocks | Accurate causality tracking | Metadata size grows with device count |
Failure Modes
- Clock skew in LWW: A device with a clock 1 hour ahead will always win. Mitigation: use hybrid logical clocks (HLC) that combine physical time with logical counters.
- Base version lost: Without the base version, field-level merge cannot distinguish "both changed" from "one changed." Store the last-synced version as the merge base.
- CRDT state bloat: OR-Set tombstones grow unboundedly. Implement garbage collection: after all devices acknowledge a version, prune tombstones.
- Conflict storm: A bug causes rapid conflicting writes. Rate-limit sync attempts and alert when conflict rate exceeds 5% of syncs.
Scaling Considerations
- Metadata overhead: Vector clocks grow linearly with device count. For systems with many devices per user, use dotted version vectors to compact metadata.
- Conflict resolution at scale: If 1% of syncs produce conflicts and you have 1M daily syncs, that is 10,000 conflicts/day. Manual resolution does not scale. Invest in automatic strategies.
- Sync ordering: Process syncs per-entity, not per-device. This allows parallel sync processing while maintaining per-entity consistency.
Observability
- Track: conflict rate per entity type, resolution strategy distribution (auto vs. manual), resolution latency, data loss rate (for LWW).
- Alert on: conflict rate exceeding baseline by 2x (indicates a bug or sync issue), manual resolution queue depth exceeding threshold.
- Log: both versions of every conflict with their vector clocks, for post-incident analysis.
Key Takeaways
- Choose your conflict resolution strategy per data type, not globally. Different data has different tolerance for loss.
- Field-level merge eliminates 80-90% of conflicts that whole-entity strategies would flag.
- CRDTs are powerful but limited in the data structures they support. Use them for counters, sets, and simple registers.
- Always store the base version (last-synced state). Without it, three-way merge is impossible.
- Manual resolution is a last resort, not a primary strategy. If users see conflict dialogs regularly, the system has failed.
Further Reading
- Handling Partial Failures in Distributed Mobile Systems: Strategies for handling partial failures in systems where mobile clients interact with multiple backend services, covering compensation, ...
- Designing Systems That Degrade Gracefully: How to build systems that continue providing value when components fail, covering load shedding, fallback strategies, and partial availab...
- Designing Mobile Systems for Poor Network Conditions: Architecture patterns for mobile apps that function reliably on slow, intermittent, and lossy networks, covering request prioritization, ...
Final Thoughts
Conflict resolution is a product decision disguised as a technical one. The strategy you choose determines whether users trust your sync system or work around it by avoiding concurrent edits entirely. Get it right, and offline-first becomes a genuine capability. Get it wrong, and it becomes a liability.
Recommended
Designing an Offline-First Sync Engine for Mobile Apps
A deep dive into building a reliable sync engine that keeps mobile apps functional without connectivity, covering conflict resolution, queue management, and real-world trade-offs.
Jetpack Compose Recomposition: A Deep Dive
A detailed look at how Compose recomposition works under the hood, what triggers it, how the slot table tracks state, and how to control it in production apps.
Event Tracking System Design for Android Applications
A systems-level breakdown of designing an event tracking system for Android, covering batching, schema enforcement, local persistence, and delivery guarantees.