Understanding ANRs: Detection, Root Causes, and Fixes
A systematic look at Application Not Responding errors on Android, covering the detection mechanism, common root causes in production, and concrete strategies to fix and prevent them.
Context
An ANR (Application Not Responding) occurs when the main thread of an Android app is blocked for too long. The system displays a dialog asking the user to wait or force-close the app. ANR rates directly impact Play Store ranking, user retention, and crash-free session metrics.
Problem
Google's vitals threshold: an ANR rate above 0.47% (for "user-perceived" ANRs) triggers a bad behavior warning in the Play Console. The challenge is that ANRs are harder to debug than crashes. The stack trace captures the state at the moment of detection, not necessarily the moment the blocking began. Root causes are often indirect, intermittent, and device-specific.
Constraints
- The main thread must respond to input events within 5 seconds
- Broadcast receivers must complete
onReceive()within 10 seconds (foreground) or 60 seconds (background) - Services must return from
onStartCommand()within 20 seconds (foreground) ContentProvideroperations block the calling thread and contribute to ANRs when slow- ANR detection is handled by the system server, not the app process. You cannot intercept or suppress it
See also: Event Tracking System Design for Android Applications.
Design
How the System Detects ANRs
The ActivityManagerService in the system server monitors the main thread's message queue. When an input event (touch, key press) is dispatched, a timer starts. If the app does not acknowledge the event within the timeout window, the system captures the process state and triggers the ANR dialog.
The detection flow:
- System dispatches an input event to the app's
InputChannel - A pending-event timer starts (5 seconds for input events)
- If the app's main thread does not call
finishInputEvent()before timeout, the system flags an ANR - The system captures thread dumps for all threads in the process
- The ANR is recorded in
/data/anr/traces.txtand reported to Play Vitals
Taxonomy of Root Causes
| Category | Example | Frequency |
|---|---|---|
| Disk I/O on main thread | SharedPreferences commit(), SQLite queries, file reads | Very common |
| Network on main thread | Synchronous HTTP calls, DNS resolution | Common (legacy code) |
| Lock contention | Main thread waiting on a lock held by a background thread | Common |
| Binder IPC | ContentProvider.query(), PackageManager calls, getSystemService() | Common, often overlooked |
| Excessive layout/measure | Deeply nested views, unoptimized RecyclerView | Moderate |
| Broadcast receiver overload | Heavy work in onReceive() | Moderate |
| Deadlock | Two threads holding locks the other needs | Rare but fatal |
| GC pressure | Stop-the-world GC pauses from allocation-heavy code | Device-dependent |
Disk I/O: The Most Common Offender
SharedPreferences.commit() writes to disk synchronously on the calling thread. On low-end devices with slow flash storage, this blocks for 50 to 200ms per call. Chain a few of these during onCreate() and you hit the ANR threshold.
// Bad: synchronous write on main thread
sharedPrefs.edit().putString("key", "value").commit()
// Good: asynchronous write
sharedPrefs.edit().putString("key", "value").apply()
// Better: use DataStore for structured async persistence
class SettingsRepository(private val dataStore: DataStore<Preferences>) {
suspend fun saveTheme(theme: String) {
dataStore.edit { prefs ->
prefs[THEME_KEY] = theme
}
}
}Lock Contention
A background thread holds a lock while doing I/O. The main thread acquires the same lock for a quick read. On fast devices, the lock is available instantly. On slow devices under load, the background thread holds the lock for 3 to 8 seconds.
// Dangerous pattern
class DataCache {
private val lock = ReentrantLock()
private var cache: Map<String, Any> = emptyMap()
// Called from background thread, holds lock during I/O
fun refresh() {
lock.withLock {
cache = fetchFromDatabase() // 500ms to 3s on slow devices
}
}
// Called from main thread, blocks if refresh() holds the lock
fun get(key: String): Any? {
lock.withLock {
return cache[key]
}
}
}
// Fix: use a concurrent data structure or copy-on-write
class DataCache {
@Volatile
private var cache: Map<String, Any> = emptyMap()
fun refresh() {
val newData = fetchFromDatabase()
cache = newData // atomic reference swap, no lock needed
}
fun get(key: String): Any? = cache[key]
}Binder IPC Calls
Many Android framework APIs are binder calls in disguise. PackageManager.getPackageInfo(), ContentResolver.query(), and even Context.getSystemService() cross process boundaries. Each binder call can block if the system server is under load.
// These are all binder calls that can block:
// packageManager.getInstalledApplications(0) - scans all packages
// contentResolver.query(contactsUri, ...) - IPC to contacts provider
// Settings.System.getString(contentResolver, ...) - IPC to settings provider
// Mitigation: move to a background dispatcher
suspend fun getInstalledApps(context: Context): List<ApplicationInfo> =
withContext(Dispatchers.IO) {
context.packageManager.getInstalledApplications(PackageManager.GET_META_DATA)
}Trade-offs
| Strategy | Benefit | Cost |
|---|---|---|
| Move all I/O off main thread | Eliminates the largest ANR category | Requires async patterns everywhere, increases code complexity |
Replace commit() with apply() | Removes synchronous disk writes | apply() can still block on Activity.onPause() due to QueuedWork |
Use StrictMode in debug builds | Catches main-thread violations early | Does not cover all binder IPC or lock contention |
Background thread for ContentProvider access | Prevents binder-related ANRs | Requires restructuring data access patterns |
Failure Modes
Related: Failure Modes I Actively Design For.
| Failure | Detection | Mitigation |
|---|---|---|
| ANR during cold start | Play Vitals "startup" ANR cluster | Profile startup with Macrobenchmark, defer non-critical init |
ANR from QueuedWork.waitToFinish() | Stack trace shows ActivityThread.handlePauseActivity | Override Application to clear the QueuedWork pending list (risky) or migrate to DataStore |
ANR from ContentProvider.onCreate() | Stack trace during app startup | Move heavy initialization out of ContentProvider.onCreate() into lazy init |
| ANR on low-RAM devices | Disproportionate ANR rate on Go/entry-level devices | Test on low-end hardware, set device-tier thresholds |
| ANR from third-party SDK initialization | Stack trace points to SDK code in Application.onCreate() | Initialize SDKs lazily or on background thread using AppStartup |
Scaling Considerations
- Deferred initialization with App Startup: use
androidx.startupto declare dependencies between initializers and control initialization order without blocking the main thread - Background initialization: SDKs like analytics, crash reporting, and A/B testing rarely need main-thread init. Move them to a background coroutine with a timeout
- Watchdog threads: implement a main-thread watchdog that logs slow operations before they become ANRs
class MainThreadWatchdog(
private val thresholdMs: Long = 3000L
) {
private val handler = Handler(Looper.getMainLooper())
private val watchdogThread = HandlerThread("anr-watchdog").apply { start() }
private val watchdogHandler = Handler(watchdogThread.looper)
fun start() {
scheduleCheck()
}
private fun scheduleCheck() {
val responded = AtomicBoolean(false)
handler.post { responded.set(true) }
watchdogHandler.postDelayed({
if (!responded.get()) {
// Main thread has been blocked for > thresholdMs
val stackTrace = Looper.getMainLooper().thread.stackTrace
reportSlowMainThread(stackTrace)
}
scheduleCheck()
}, thresholdMs)
}
private fun reportSlowMainThread(stackTrace: Array<StackTraceElement>) {
Log.w("ANR-Watchdog", "Main thread blocked:\n${stackTrace.joinToString("\n")}")
}
}Observability
- Play Vitals: clusters ANRs by stack trace. Review weekly. Focus on top 5 clusters
- Custom ANR watchdog: captures pre-ANR stack traces with more context than system traces
- StrictMode: enable
detectDiskReads(),detectDiskWrites(),detectNetwork(),detectCustomSlowCalls()in debug builds - Perfetto/systrace: capture main thread scheduling to see exactly when and why the main thread was blocked
- ANR rate by device tier: segment ANR rates by RAM, CPU, and Android version to identify device-specific patterns
Key Takeaways
- ANRs are detected by the system server, not your app. You cannot suppress or delay them
- Disk I/O and lock contention are the top two root causes in production. Audit every
SharedPreferences.commit()and synchronized block that touches the main thread - Binder IPC calls are hidden blockers.
PackageManager,ContentResolver, andSettingsAPIs all cross process boundaries apply()is not always safe.QueuedWork.waitToFinish()on activity pause can still block the main thread- Test on low-end devices. ANR rates on budget hardware are often 5 to 10x higher than on flagships
- A pre-ANR watchdog gives you actionable stack traces before the system reports the ANR
Further Reading
- Memory Leaks in Android: Patterns I've Seen in Production: Real-world memory leak patterns from production Android apps, covering lifecycle-bound leaks, static references, listener registration, a...
- Debugging Performance Issues in Large Android Apps: A systematic approach to identifying, isolating, and fixing performance bottlenecks in large Android codebases, covering profiling strate...
- Understanding the Android Main Thread at Scale: A systems-level look at the Android main thread, its message queue, how work is scheduled and blocked, and strategies for keeping it resp...
Final Thoughts
ANRs are a systems problem, not a coding mistake. They emerge from the interaction between your code, the Android framework, the kernel scheduler, and the hardware. The most effective strategy is defense in depth: strict mode in development, a watchdog in production, and continuous monitoring of Play Vitals by device segment. Treat any main-thread blocking operation as a latent ANR and move it off the main thread proactively.
Recommended
Jetpack Compose Recomposition: A Deep Dive
A detailed look at how Compose recomposition works under the hood, what triggers it, how the slot table tracks state, and how to control it in production apps.
Event Tracking System Design for Android Applications
A systems-level breakdown of designing an event tracking system for Android, covering batching, schema enforcement, local persistence, and delivery guarantees.
Memory Leaks in Android: Patterns I've Seen in Production
Real-world memory leak patterns from production Android apps, covering lifecycle-bound leaks, static references, listener registration, and systematic detection strategies.