How I Profile Android Apps in Production
Techniques for collecting meaningful performance data from production Android apps without degrading user experience, covering sampling strategies, custom metrics, and real-world pitfalls.
Context
Lab profiling gives you control. Production profiling gives you truth. The performance characteristics of an app running on a developer's Pixel differ wildly from the same app running on a three-year-old Samsung in Lagos with 2GB of free RAM. Production profiling bridges that gap by collecting real performance data from real users on real devices.
Problem
Standard profiling tools (Android Studio Profiler, Perfetto) are designed for local use. They attach to a process, collect detailed traces, and produce gigabytes of output. None of this works in production. You need lightweight, sampling-based instrumentation that runs on every session without users noticing.
The core tension: collect enough data to diagnose issues, but not so much that the instrumentation itself becomes the performance problem.
Constraints
- Instrumentation overhead must stay below 1% of CPU time
- Data collection must not increase ANR rates
- Sampling must produce statistically valid results without capturing every event
- Must work on Android 7+ (API 24+), covering 95%+ of active devices
- Data pipeline must handle millions of events per day without dropping signals
- User privacy must be preserved (no PII in traces)
Related: Event Tracking System Design for Android Applications.
Design
Layer 1: Frame-Level Metrics
The FrameMetrics API provides per-frame timing breakdowns without custom instrumentation.
class FrameMetricsCollector(
private val reporter: MetricsReporter,
private val sampleRate: Double = 0.1 // 10% of sessions
) {
private val listener = Window.OnFrameMetricsAvailableListener { _, metrics, _ ->
val totalDuration = metrics.getMetric(FrameMetrics.TOTAL_DURATION)
val layoutDuration = metrics.getMetric(FrameMetrics.LAYOUT_MEASURE_DURATION)
val drawDuration = metrics.getMetric(FrameMetrics.DRAW_DURATION)
if (totalDuration > 16_000_000) { // > 16ms in nanos
reporter.reportSlowFrame(
total = totalDuration,
layout = layoutDuration,
draw = drawDuration,
screen = currentScreenName
)
}
}
fun attach(activity: Activity) {
if (Random.nextDouble() > sampleRate) return
activity.window.addOnFrameMetricsAvailableListener(
listener, Handler(Looper.getMainLooper())
)
}
fun detach(activity: Activity) {
activity.window.removeOnFrameMetricsAvailableListener(listener)
}
}Layer 2: Custom Trace Spans
Wrap critical user journeys in lightweight trace spans. These are not Perfetto traces. They are simple start/stop timers reported to your analytics backend.
See also: Designing a Simple Metrics Collection Service.
object ProdTracer {
private val activeSpans = ConcurrentHashMap<String, Long>()
fun beginSpan(name: String) {
activeSpans[name] = SystemClock.elapsedRealtimeNanos()
Trace.beginSection(name) // also visible in local Perfetto traces
}
fun endSpan(name: String, metadata: Map<String, String> = emptyMap()) {
val startNanos = activeSpans.remove(name) ?: return
Trace.endSection()
val durationMs = (SystemClock.elapsedRealtimeNanos() - startNanos) / 1_000_000
MetricsReporter.report(
event = "trace_span",
properties = metadata + mapOf(
"span_name" to name,
"duration_ms" to durationMs.toString(),
"device_class" to DeviceClassifier.classify().name
)
)
}
}
// Usage at call sites
fun loadFeed() {
ProdTracer.beginSpan("feed_load")
feedRepository.fetch()
.onEach { ProdTracer.endSpan("feed_load", mapOf("item_count" to it.size.toString())) }
.launchIn(viewModelScope)
}Layer 3: Startup Tracing
Cold start is the most critical metric. Measure it with precision.
class StartupProfiler private constructor() {
companion object {
// Called from ContentProvider or Application static init
val processStartTime: Long = SystemClock.elapsedRealtime()
}
private val milestones = mutableListOf<Pair<String, Long>>()
fun mark(name: String) {
milestones.add(name to SystemClock.elapsedRealtime())
}
fun report() {
val spans = milestones.mapIndexed { index, (name, time) ->
val prev = if (index == 0) processStartTime else milestones[index - 1].second
name to (time - prev)
}
MetricsReporter.report(
event = "cold_start_breakdown",
properties = spans.associate { it.first to it.second.toString() }
)
}
}
// Instrumentation points
class MyApp : Application() {
override fun onCreate() {
super.onCreate()
StartupProfiler.mark("app_oncreate")
initCriticalSdks()
StartupProfiler.mark("sdks_initialized")
}
}
class MainActivity : AppCompatActivity() {
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
StartupProfiler.mark("activity_oncreate")
setContentView(R.layout.activity_main)
StartupProfiler.mark("content_set")
window.decorView.post {
StartupProfiler.mark("first_frame")
StartupProfiler.report()
}
}
}Layer 4: Sampling Strategy
Not every session should report everything. Use a tiered sampling model.
| Data Type | Sample Rate | Rationale |
|---|---|---|
| Cold start timing | 100% | Critical metric, tiny payload |
| Slow frame counts | 10% of sessions | Moderate payload, needs volume for P95 |
| Full frame breakdown | 1% of sessions | Large payload, detailed diagnosis |
| Network timing | 100% | Paired with server-side data |
| Memory snapshots | 0.1% of sessions | Expensive to collect and transmit |
class SamplingConfig(
private val remoteConfig: RemoteConfig
) {
fun shouldCollect(tier: MetricsTier): Boolean {
val rate = remoteConfig.getDouble("sampling_rate_${tier.name.lowercase()}")
return Random.nextDouble() < rate
}
}
enum class MetricsTier {
CRITICAL, // 100%
STANDARD, // 10%
DETAILED, // 1%
DIAGNOSTIC // 0.1%
}Layer 5: Device Classification
Aggregate metrics by device class. A P95 frame time that mixes Pixel 8 and Galaxy A03 data is meaningless.
enum class DeviceClass { HIGH, MEDIUM, LOW }
object DeviceClassifier {
fun classify(): DeviceClass {
val cores = Runtime.getRuntime().availableProcessors()
val ramMb = getDeviceRamMb()
return when {
cores >= 8 && ramMb >= 6000 -> DeviceClass.HIGH
cores >= 4 && ramMb >= 3000 -> DeviceClass.MEDIUM
else -> DeviceClass.LOW
}
}
private fun getDeviceRamMb(): Long {
val memInfo = ActivityManager.MemoryInfo()
val am = appContext.getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager
am.getMemoryInfo(memInfo)
return memInfo.totalMem / (1024 * 1024)
}
}Trade-offs
| Decision | Benefit | Cost |
|---|---|---|
| Sampling vs. full collection | Negligible overhead | May miss rare issues |
| Client-side aggregation | Less data transmitted | Loses granularity |
| Server-side aggregation | Full raw data | Higher bandwidth and storage |
| Device class bucketing | Meaningful comparisons | More complex dashboards |
| Remote-configurable rates | Adjust without release | Adds latency to first config fetch |
Failure Modes
- Instrumentation bias: the act of measuring changes behavior. Keep overhead under 1%.
- Survivorship bias: users on terrible devices may crash before your metrics report. Cross-reference with crash rates.
- Clock drift:
SystemClock.elapsedRealtime()is monotonic and safe. Never useSystem.currentTimeMillis()for duration measurements. - Metric flooding: a bug in sampling logic can send millions of events per minute. Implement client-side rate limiting.
- Stale sampling config: if remote config fetch fails, fall back to conservative defaults baked into the APK.
Observability
- Build dashboards segmented by device class, OS version, and app version
- Track P50, P90, P95, and P99 for every metric. Averages hide problems.
- Set alerts on P95 cold start time per release. A 10% regression on P95 should block rollout.
- Correlate performance metrics with business metrics (conversion, session length) to prioritize fixes.
Key Takeaways
- Production profiling is sampling-based, not trace-based. Design for statistical validity.
- Always segment by device class. Aggregated metrics across hardware tiers are misleading.
- Cold start is the one metric worth collecting at 100% of sessions.
- Use remote config to adjust sampling rates without shipping new code.
- Measure the measurement. If your instrumentation costs more than 1% CPU, it is too heavy.
- Correlate performance data with business outcomes to justify engineering investment.
Further Reading
- Memory Leaks in Android: Patterns I've Seen in Production: Real-world memory leak patterns from production Android apps, covering lifecycle-bound leaks, static references, listener registration, a...
- Debugging Performance Issues in Large Android Apps: A systematic approach to identifying, isolating, and fixing performance bottlenecks in large Android codebases, covering profiling strate...
- Diagnosing Battery Drain in Android Apps: A structured methodology for identifying and fixing battery drain in Android apps, covering wake locks, location updates, background work...
Final Thoughts
The best performance insights come from production, not from profiling sessions on a developer's desk. Build lightweight, sampling-based instrumentation into your app from day one. Treat it as infrastructure, not as a debugging afterthought. The data it produces will drive better decisions than any amount of local profiling.
Recommended
Jetpack Compose Recomposition: A Deep Dive
A detailed look at how Compose recomposition works under the hood, what triggers it, how the slot table tracks state, and how to control it in production apps.
Event Tracking System Design for Android Applications
A systems-level breakdown of designing an event tracking system for Android, covering batching, schema enforcement, local persistence, and delivery guarantees.
Understanding ANRs: Detection, Root Causes, and Fixes
A systematic look at Application Not Responding errors on Android, covering the detection mechanism, common root causes in production, and concrete strategies to fix and prevent them.