Load Testing Mobile Backends With Realistic Traffic
Designing load tests that replicate mobile traffic patterns including bursty connections, mixed network conditions, and session-based workflows.
Context
A mobile backend serving 80,000 DAU needed load testing before a product launch expected to double traffic. Standard load tests (uniform request rate, single endpoint) did not reveal the bottlenecks that appeared in production. I redesigned the load testing approach to replicate mobile-specific traffic patterns.
See also: Mobile Analytics Pipeline: From App Event to Dashboard.
Problem
Mobile traffic differs from web traffic in critical ways: request bursts on app open, session-based workflows (not independent requests), variable payload sizes, and mixed network conditions causing connection pooling issues. Load tests that ignore these patterns produce misleading results.
Constraints
- Load testing tool: k6 (supports scenarios, custom protocols, and browser-like behavior)
- Target: 160,000 simulated DAU (2x current)
- Backend: Node.js API on AWS ECS, Postgres, Redis, S3
- Test duration: 30 minutes sustained, with 5-minute ramp-up
- Budget: run tests on dedicated infrastructure to avoid affecting production
- Must test both API performance and downstream service behavior (database, cache, storage)
Related: Debugging Performance Issues in Large Android Apps.
Design
Traffic Pattern Modeling
I analyzed 7 days of production logs to extract realistic traffic patterns:
| Metric | Production Value |
|---|---|
| Peak requests/min | 45,000 |
| Average session duration | 4.2 minutes |
| Requests per session | 12-18 |
| App open burst (first 3 seconds) | 5-7 requests |
| Background sync interval | Every 30 seconds |
| Concurrent active sessions at peak | 8,000 |
Scenario Design
Instead of a flat request rate, I designed user journey scenarios:
Scenario 1: App Open (40% of traffic)
1. POST /auth/refresh (token refresh)
2. GET /user/profile
3. GET /feed?limit=20 (parallel)
4. GET /notifications/unread (parallel)
5. GET /config/remote (parallel)
-- 3-second burst, then idle
Scenario 2: Browse and Interact (35% of traffic)
1. GET /feed?limit=20&offset=N (scroll, every 2-3 seconds)
2. POST /events/track (analytics, per scroll)
3. GET /content/:id (tap on item)
4. POST /content/:id/like (occasional)
-- 2-4 minute session
Scenario 3: Background Sync (25% of traffic)
1. POST /sync/push (upload local changes)
2. GET /sync/pull?since=TIMESTAMP
-- Every 30 seconds while app is backgrounded
k6 Implementation
export const options = {
scenarios: {
app_open: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '5m', target: 3200 }, // 40% of 8,000
{ duration: '25m', target: 3200 },
{ duration: '2m', target: 0 },
],
exec: 'appOpenScenario',
},
browse: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '5m', target: 2800 }, // 35% of 8,000
{ duration: '25m', target: 2800 },
{ duration: '2m', target: 0 },
],
exec: 'browseScenario',
},
background_sync: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '5m', target: 2000 }, // 25% of 8,000
{ duration: '25m', target: 2000 },
{ duration: '2m', target: 0 },
],
exec: 'backgroundSyncScenario',
},
},
};Network Condition Simulation
Mobile clients operate on varying networks. I distributed virtual users across simulated conditions:
| Network | % of VUs | Simulated RTT | Bandwidth |
|---|---|---|---|
| 4G | 50% | 50ms | 10Mbps |
| 3G | 30% | 150ms | 1.5Mbps |
| Poor 3G | 15% | 300ms | 400Kbps |
| Edge/2G | 5% | 500ms | 100Kbps |
Slow clients hold connections longer, which directly impacts connection pool utilization on the server.
Trade-offs
Load Test Design Approaches
| Approach | Realism | Setup Effort | Insight Quality |
|---|---|---|---|
| Flat rate, single endpoint | Low | Low | Finds basic throughput ceiling |
| Flat rate, mixed endpoints | Medium | Low | Finds per-endpoint bottlenecks |
| Scenario-based, uniform network | High | Medium | Finds workflow bottlenecks |
| Scenario-based, mixed network (this) | Very high | High | Finds production-realistic bottlenecks |
Results: What We Found
| Bottleneck | Discovered At | Cause |
|---|---|---|
| Auth token refresh storm | 2x load | 3,200 concurrent token refreshes on "app open" saturated the auth service |
| Connection pool exhaustion | 1.5x load | Slow clients (poor 3G) holding connections for 2-3 seconds consumed all pool slots |
| Feed query timeout | 1.8x load | Feed aggregation query exceeded 5s under concurrent load |
| Redis memory spike | 1.7x load | Session cache growing faster than TTL eviction during ramp-up |
| S3 request throttling | 2x load | Image URL generation hitting S3 API rate limits |
Three of these five bottlenecks would not have been found with a flat-rate, single-endpoint load test.
Bottleneck Resolution
| Bottleneck | Fix | Impact |
|---|---|---|
| Auth refresh storm | Stagger token expiration (jitter) | Spread 3,200 refreshes over 30 seconds |
| Connection pool | Increase pool size + add PgBouncer | Handled 2x slow clients |
| Feed query | Add query-level caching (60s TTL) | Reduced DB load by 85% |
| Redis memory | Reduce session TTL from 24h to 4h | Cut peak memory by 60% |
| S3 throttling | Pre-sign URLs in batch, cache for 1h | Eliminated S3 API calls per request |
Failure Modes
Load test infrastructure itself becoming the bottleneck: k6 running on undersized instances can max out CPU before generating target load. Monitor k6 process metrics (CPU, memory, network) alongside target metrics. I used 4 k6 instances on c5.2xlarge to generate 8,000 VUs.
Unrealistic data distribution: If all VUs request the same feed items or user profiles, caching makes the test artificially easy. Use a realistic data distribution (Zipfian for content, uniform for user IDs) to stress both cache and database.
Missing warm-up effects: Production databases have warm buffer caches. A fresh test environment has cold caches, making initial results pessimistic. Run a 5-minute warm-up at 50% load before measuring.
Network simulation limitations: k6 does not natively simulate variable network conditions. I used a proxy layer (toxiproxy) between k6 and the API to add latency and bandwidth limits. This adds infrastructure complexity but produces more realistic results.
Scaling Considerations
- Scale load test infrastructure proportionally. A rule of thumb: 1 k6 instance per 2,000 VUs for HTTP tests.
- Run load tests in the same region as the production environment to minimize test infrastructure network noise.
- Store load test results (k6 outputs to InfluxDB, Prometheus, or JSON) for historical comparison across releases.
- Automate load tests in CI for release candidates. A performance regression gate prevents deploying slower code.
Observability
During load tests, monitor:
- API layer: request rate, error rate, p50/p95/p99 latency per endpoint, active connections
- Database: active connections, query duration, lock waits, replication lag
- Cache: hit rate, memory usage, eviction rate, connection count
- Infrastructure: CPU, memory, network I/O, disk I/O per service
- k6 metrics: VU count, iteration rate, request rate, failed requests, data sent/received
The most useful view: a dashboard overlaying k6 VU count with API p95 latency and database connection count. This shows exactly when and why latency degrades.
Key Takeaways
- Model load tests as user journeys, not independent requests. Mobile users follow predictable session patterns that create correlated load.
- Simulate network conditions. Slow clients hold server resources longer, which is the primary difference between mobile and web load patterns.
- The "app open" burst is the most dangerous traffic pattern. Thousands of users opening the app simultaneously (after a push notification, for example) creates a thundering herd on auth and feed endpoints.
- Three of five bottlenecks found were invisible to flat-rate testing. Scenario-based testing with mixed network conditions is worth the setup effort.
- Run load tests regularly, not just before launches. Performance regressions are easier to fix when caught early.
Further Reading
- How I'd Design a Mobile Configuration System at Scale: Designing a configuration system for mobile apps at scale, covering config delivery, caching layers, override hierarchies, and safe rollo...
- Benchmarking Database Writes Under Load: Measured write throughput and latency for Postgres under increasing concurrency, comparing single inserts, batch inserts, COPY, and async...
- What Breaks First When Traffic Scales: A catalog of components that fail first under increasing traffic, ordered by how commonly they become bottlenecks in web applications.
Final Thoughts
After fixing the five bottlenecks identified in load testing, the backend handled 2.2x current load with p95 latency under 300ms. The launch proceeded without incident, traffic peaked at 1.8x, and no new bottlenecks emerged. The total load testing effort was 3 days: 1 day for scenario design and implementation, 1 day for test execution and analysis, and 1 day for fixes. That investment prevented what would have been a degraded launch experience for 80,000 users.
Recommended
Jetpack Compose Recomposition: A Deep Dive
A detailed look at how Compose recomposition works under the hood, what triggers it, how the slot table tracks state, and how to control it in production apps.
Understanding ANRs: Detection, Root Causes, and Fixes
A systematic look at Application Not Responding errors on Android, covering the detection mechanism, common root causes in production, and concrete strategies to fix and prevent them.
Memory Leaks in Android: Patterns I've Seen in Production
Real-world memory leak patterns from production Android apps, covering lifecycle-bound leaks, static references, listener registration, and systematic detection strategies.