How I Profile Android Apps in Production

Dhruval Dhameliya·December 29, 2025·7 min read

Techniques for collecting meaningful performance data from production Android apps without degrading user experience, covering sampling strategies, custom metrics, and real-world pitfalls.

Context

Lab profiling gives you control. Production profiling gives you truth. The performance characteristics of an app running on a developer's Pixel differ wildly from the same app running on a three-year-old Samsung in Lagos with 2GB of free RAM. Production profiling bridges that gap by collecting real performance data from real users on real devices.

Problem

Standard profiling tools (Android Studio Profiler, Perfetto) are designed for local use. They attach to a process, collect detailed traces, and produce gigabytes of output. None of this works in production. You need lightweight, sampling-based instrumentation that runs on every session without users noticing.

The core tension: collect enough data to diagnose issues, but not so much that the instrumentation itself becomes the performance problem.

Constraints

  • Instrumentation overhead must stay below 1% of CPU time
  • Data collection must not increase ANR rates
  • Sampling must produce statistically valid results without capturing every event
  • Must work on Android 7+ (API 24+), covering 95%+ of active devices
  • Data pipeline must handle millions of events per day without dropping signals
  • User privacy must be preserved (no PII in traces)

Related: Event Tracking System Design for Android Applications.

Design

Layer 1: Frame-Level Metrics

The FrameMetrics API provides per-frame timing breakdowns without custom instrumentation.

class FrameMetricsCollector(
    private val reporter: MetricsReporter,
    private val sampleRate: Double = 0.1 // 10% of sessions
) {
    private val listener = Window.OnFrameMetricsAvailableListener { _, metrics, _ ->
        val totalDuration = metrics.getMetric(FrameMetrics.TOTAL_DURATION)
        val layoutDuration = metrics.getMetric(FrameMetrics.LAYOUT_MEASURE_DURATION)
        val drawDuration = metrics.getMetric(FrameMetrics.DRAW_DURATION)
 
        if (totalDuration > 16_000_000) { // > 16ms in nanos
            reporter.reportSlowFrame(
                total = totalDuration,
                layout = layoutDuration,
                draw = drawDuration,
                screen = currentScreenName
            )
        }
    }
 
    fun attach(activity: Activity) {
        if (Random.nextDouble() > sampleRate) return
        activity.window.addOnFrameMetricsAvailableListener(
            listener, Handler(Looper.getMainLooper())
        )
    }
 
    fun detach(activity: Activity) {
        activity.window.removeOnFrameMetricsAvailableListener(listener)
    }
}

Layer 2: Custom Trace Spans

Wrap critical user journeys in lightweight trace spans. These are not Perfetto traces. They are simple start/stop timers reported to your analytics backend.

See also: Designing a Simple Metrics Collection Service.

object ProdTracer {
    private val activeSpans = ConcurrentHashMap<String, Long>()
 
    fun beginSpan(name: String) {
        activeSpans[name] = SystemClock.elapsedRealtimeNanos()
        Trace.beginSection(name) // also visible in local Perfetto traces
    }
 
    fun endSpan(name: String, metadata: Map<String, String> = emptyMap()) {
        val startNanos = activeSpans.remove(name) ?: return
        Trace.endSection()
        val durationMs = (SystemClock.elapsedRealtimeNanos() - startNanos) / 1_000_000
 
        MetricsReporter.report(
            event = "trace_span",
            properties = metadata + mapOf(
                "span_name" to name,
                "duration_ms" to durationMs.toString(),
                "device_class" to DeviceClassifier.classify().name
            )
        )
    }
}
 
// Usage at call sites
fun loadFeed() {
    ProdTracer.beginSpan("feed_load")
    feedRepository.fetch()
        .onEach { ProdTracer.endSpan("feed_load", mapOf("item_count" to it.size.toString())) }
        .launchIn(viewModelScope)
}

Layer 3: Startup Tracing

Cold start is the most critical metric. Measure it with precision.

class StartupProfiler private constructor() {
    companion object {
        // Called from ContentProvider or Application static init
        val processStartTime: Long = SystemClock.elapsedRealtime()
    }
 
    private val milestones = mutableListOf<Pair<String, Long>>()
 
    fun mark(name: String) {
        milestones.add(name to SystemClock.elapsedRealtime())
    }
 
    fun report() {
        val spans = milestones.mapIndexed { index, (name, time) ->
            val prev = if (index == 0) processStartTime else milestones[index - 1].second
            name to (time - prev)
        }
        MetricsReporter.report(
            event = "cold_start_breakdown",
            properties = spans.associate { it.first to it.second.toString() }
        )
    }
}
 
// Instrumentation points
class MyApp : Application() {
    override fun onCreate() {
        super.onCreate()
        StartupProfiler.mark("app_oncreate")
        initCriticalSdks()
        StartupProfiler.mark("sdks_initialized")
    }
}
 
class MainActivity : AppCompatActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        StartupProfiler.mark("activity_oncreate")
        setContentView(R.layout.activity_main)
        StartupProfiler.mark("content_set")
 
        window.decorView.post {
            StartupProfiler.mark("first_frame")
            StartupProfiler.report()
        }
    }
}

Layer 4: Sampling Strategy

Not every session should report everything. Use a tiered sampling model.

Data TypeSample RateRationale
Cold start timing100%Critical metric, tiny payload
Slow frame counts10% of sessionsModerate payload, needs volume for P95
Full frame breakdown1% of sessionsLarge payload, detailed diagnosis
Network timing100%Paired with server-side data
Memory snapshots0.1% of sessionsExpensive to collect and transmit
class SamplingConfig(
    private val remoteConfig: RemoteConfig
) {
    fun shouldCollect(tier: MetricsTier): Boolean {
        val rate = remoteConfig.getDouble("sampling_rate_${tier.name.lowercase()}")
        return Random.nextDouble() < rate
    }
}
 
enum class MetricsTier {
    CRITICAL,   // 100%
    STANDARD,   // 10%
    DETAILED,   // 1%
    DIAGNOSTIC  // 0.1%
}

Layer 5: Device Classification

Aggregate metrics by device class. A P95 frame time that mixes Pixel 8 and Galaxy A03 data is meaningless.

enum class DeviceClass { HIGH, MEDIUM, LOW }
 
object DeviceClassifier {
    fun classify(): DeviceClass {
        val cores = Runtime.getRuntime().availableProcessors()
        val ramMb = getDeviceRamMb()
        return when {
            cores >= 8 && ramMb >= 6000 -> DeviceClass.HIGH
            cores >= 4 && ramMb >= 3000 -> DeviceClass.MEDIUM
            else -> DeviceClass.LOW
        }
    }
 
    private fun getDeviceRamMb(): Long {
        val memInfo = ActivityManager.MemoryInfo()
        val am = appContext.getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager
        am.getMemoryInfo(memInfo)
        return memInfo.totalMem / (1024 * 1024)
    }
}

Trade-offs

DecisionBenefitCost
Sampling vs. full collectionNegligible overheadMay miss rare issues
Client-side aggregationLess data transmittedLoses granularity
Server-side aggregationFull raw dataHigher bandwidth and storage
Device class bucketingMeaningful comparisonsMore complex dashboards
Remote-configurable ratesAdjust without releaseAdds latency to first config fetch

Failure Modes

  • Instrumentation bias: the act of measuring changes behavior. Keep overhead under 1%.
  • Survivorship bias: users on terrible devices may crash before your metrics report. Cross-reference with crash rates.
  • Clock drift: SystemClock.elapsedRealtime() is monotonic and safe. Never use System.currentTimeMillis() for duration measurements.
  • Metric flooding: a bug in sampling logic can send millions of events per minute. Implement client-side rate limiting.
  • Stale sampling config: if remote config fetch fails, fall back to conservative defaults baked into the APK.

Observability

  • Build dashboards segmented by device class, OS version, and app version
  • Track P50, P90, P95, and P99 for every metric. Averages hide problems.
  • Set alerts on P95 cold start time per release. A 10% regression on P95 should block rollout.
  • Correlate performance metrics with business metrics (conversion, session length) to prioritize fixes.

Key Takeaways

  • Production profiling is sampling-based, not trace-based. Design for statistical validity.
  • Always segment by device class. Aggregated metrics across hardware tiers are misleading.
  • Cold start is the one metric worth collecting at 100% of sessions.
  • Use remote config to adjust sampling rates without shipping new code.
  • Measure the measurement. If your instrumentation costs more than 1% CPU, it is too heavy.
  • Correlate performance data with business outcomes to justify engineering investment.

Further Reading

Final Thoughts

The best performance insights come from production, not from profiling sessions on a developer's desk. Build lightweight, sampling-based instrumentation into your app from day one. Treat it as infrastructure, not as a debugging afterthought. The data it produces will drive better decisions than any amount of local profiling.

Recommended