How I Profile Android Apps in Production

Context

Lab profiling gives you control. Production profiling gives you truth. The performance characteristics of an app running on a developer's Pixel differ wildly from the same app running on a three-year-old Samsung in Lagos with 2GB of free RAM. Production profiling bridges that gap by collecting real performance data from real users on real devices.

Problem

Standard profiling tools (Android Studio Profiler, Perfetto) are designed for local use. They attach to a process, collect detailed traces, and produce gigabytes of output. None of this works in production. You need lightweight, sampling-based instrumentation that runs on every session without users noticing.

The core tension: collect enough data to diagnose issues, but not so much that the instrumentation itself becomes the performance problem.

Constraints

Instrumentation overhead must stay below 1% of CPU time
Data collection must not increase ANR rates
Sampling must produce statistically valid results without capturing every event
Must work on Android 7+ (API 24+), covering 95%+ of active devices
Data pipeline must handle millions of events per day without dropping signals
User privacy must be preserved (no PII in traces)

Design

Layer 1: Frame-Level Metrics

The FrameMetrics API provides per-frame timing breakdowns without custom instrumentation.

class FrameMetricsCollector(
    private val reporter: MetricsReporter,
    private val sampleRate: Double = 0.1 // 10% of sessions
) {
    private val listener = Window.OnFrameMetricsAvailableListener { _, metrics, _ ->
        val totalDuration = metrics.getMetric(FrameMetrics.TOTAL_DURATION)
        val layoutDuration = metrics.getMetric(FrameMetrics.LAYOUT_MEASURE_DURATION)
        val drawDuration = metrics.getMetric(FrameMetrics.DRAW_DURATION)
 
        if (totalDuration > 16_000_000) { // > 16ms in nanos
            reporter.reportSlowFrame(
                total = totalDuration,
                layout = layoutDuration,
                draw = drawDuration,
                screen = currentScreenName
            )
        }
    }
 
    fun attach(activity: Activity) {
        if (Random.nextDouble() > sampleRate) return
        activity.window.addOnFrameMetricsAvailableListener(
            listener, Handler(Looper.getMainLooper())
        )
    }
 
    fun detach(activity: Activity) {
        activity.window.removeOnFrameMetricsAvailableListener(listener)
    }
}

Layer 2: Custom Trace Spans

Wrap critical user journeys in lightweight trace spans. These are not Perfetto traces. They are simple start/stop timers reported to your analytics backend.

object ProdTracer {
    private val activeSpans = ConcurrentHashMap<String, Long>()
 
    fun beginSpan(name: String) {
        activeSpans[name] = SystemClock.elapsedRealtimeNanos()
        Trace.beginSection(name) // also visible in local Perfetto traces
    }
 
    fun endSpan(name: String, metadata: Map<String, String> = emptyMap()) {
        val startNanos = activeSpans.remove(name) ?: return
        Trace.endSection()
        val durationMs = (SystemClock.elapsedRealtimeNanos() - startNanos) / 1_000_000
 
        MetricsReporter.report(
            event = "trace_span",
            properties = metadata + mapOf(
                "span_name" to name,
                "duration_ms" to durationMs.toString(),
                "device_class" to DeviceClassifier.classify().name
            )
        )
    }
}
 
// Usage at call sites
fun loadFeed() {
    ProdTracer.beginSpan("feed_load")
    feedRepository.fetch()
        .onEach { ProdTracer.endSpan("feed_load", mapOf("item_count" to it.size.toString())) }
        .launchIn(viewModelScope)
}

Layer 3: Startup Tracing

Cold start is the most critical metric. Measure it with precision.

class StartupProfiler private constructor() {
    companion object {
        // Called from ContentProvider or Application static init
        val processStartTime: Long = SystemClock.elapsedRealtime()
    }
 
    private val milestones = mutableListOf<Pair<String, Long>>()
 
    fun mark(name: String) {
        milestones.add(name to SystemClock.elapsedRealtime())
    }
 
    fun report() {
        val spans = milestones.mapIndexed { index, (name, time) ->
            val prev = if (index == 0) processStartTime else milestones[index - 1].second
            name to (time - prev)
        }
        MetricsReporter.report(
            event = "cold_start_breakdown",
            properties = spans.associate { it.first to it.second.toString() }
        )
    }
}
 
// Instrumentation points
class MyApp : Application() {
    override fun onCreate() {
        super.onCreate()
        StartupProfiler.mark("app_oncreate")
        initCriticalSdks()
        StartupProfiler.mark("sdks_initialized")
    }
}
 
class MainActivity : AppCompatActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        StartupProfiler.mark("activity_oncreate")
        setContentView(R.layout.activity_main)
        StartupProfiler.mark("content_set")
 
        window.decorView.post {
            StartupProfiler.mark("first_frame")
            StartupProfiler.report()
        }
    }
}

Layer 4: Sampling Strategy

Not every session should report everything. Use a tiered sampling model.

Data Type	Sample Rate	Rationale
Cold start timing	100%	Critical metric, tiny payload
Slow frame counts	10% of sessions	Moderate payload, needs volume for P95
Full frame breakdown	1% of sessions	Large payload, detailed diagnosis
Network timing	100%	Paired with server-side data
Memory snapshots	0.1% of sessions	Expensive to collect and transmit

class SamplingConfig(
    private val remoteConfig: RemoteConfig
) {
    fun shouldCollect(tier: MetricsTier): Boolean {
        val rate = remoteConfig.getDouble("sampling_rate_${tier.name.lowercase()}")
        return Random.nextDouble() < rate
    }
}
 
enum class MetricsTier {
    CRITICAL,   // 100%
    STANDARD,   // 10%
    DETAILED,   // 1%
    DIAGNOSTIC  // 0.1%
}

Layer 5: Device Classification

Aggregate metrics by device class. A P95 frame time that mixes Pixel 8 and Galaxy A03 data is meaningless.

enum class DeviceClass { HIGH, MEDIUM, LOW }
 
object DeviceClassifier {
    fun classify(): DeviceClass {
        val cores = Runtime.getRuntime().availableProcessors()
        val ramMb = getDeviceRamMb()
        return when {
            cores >= 8 && ramMb >= 6000 -> DeviceClass.HIGH
            cores >= 4 && ramMb >= 3000 -> DeviceClass.MEDIUM
            else -> DeviceClass.LOW
        }
    }
 
    private fun getDeviceRamMb(): Long {
        val memInfo = ActivityManager.MemoryInfo()
        val am = appContext.getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager
        am.getMemoryInfo(memInfo)
        return memInfo.totalMem / (1024 * 1024)
    }
}

Trade-offs

Decision	Benefit	Cost
Sampling vs. full collection	Negligible overhead	May miss rare issues
Client-side aggregation	Less data transmitted	Loses granularity
Server-side aggregation	Full raw data	Higher bandwidth and storage
Device class bucketing	Meaningful comparisons	More complex dashboards
Remote-configurable rates	Adjust without release	Adds latency to first config fetch

Failure Modes

Instrumentation bias: the act of measuring changes behavior. Keep overhead under 1%.
Survivorship bias: users on terrible devices may crash before your metrics report. Cross-reference with crash rates.
Clock drift: SystemClock.elapsedRealtime() is monotonic and safe. Never use System.currentTimeMillis() for duration measurements.
Metric flooding: a bug in sampling logic can send millions of events per minute. Implement client-side rate limiting.
Stale sampling config: if remote config fetch fails, fall back to conservative defaults baked into the APK.

Observability

Build dashboards segmented by device class, OS version, and app version
Track P50, P90, P95, and P99 for every metric. Averages hide problems.
Set alerts on P95 cold start time per release. A 10% regression on P95 should block rollout.
Correlate performance metrics with business metrics (conversion, session length) to prioritize fixes.

Key Takeaways

Production profiling is sampling-based, not trace-based. Design for statistical validity.
Always segment by device class. Aggregated metrics across hardware tiers are misleading.
Cold start is the one metric worth collecting at 100% of sessions.
Use remote config to adjust sampling rates without shipping new code.
Measure the measurement. If your instrumentation costs more than 1% CPU, it is too heavy.
Correlate performance data with business outcomes to justify engineering investment.

Final Thoughts

The best performance insights come from production, not from profiling sessions on a developer's desk. Build lightweight, sampling-based instrumentation into your app from day one. Treat it as infrastructure, not as a debugging afterthought. The data it produces will drive better decisions than any amount of local profiling.