Load Testing Mobile Backends With Realistic Traffic

Context

A mobile backend serving 80,000 DAU needed load testing before a product launch expected to double traffic. Standard load tests (uniform request rate, single endpoint) did not reveal the bottlenecks that appeared in production. I redesigned the load testing approach to replicate mobile-specific traffic patterns.

Problem

Mobile traffic differs from web traffic in critical ways: request bursts on app open, session-based workflows (not independent requests), variable payload sizes, and mixed network conditions causing connection pooling issues. Load tests that ignore these patterns produce misleading results.

Constraints

Load testing tool: k6 (supports scenarios, custom protocols, and browser-like behavior)
Target: 160,000 simulated DAU (2x current)
Backend: Node.js API on AWS ECS, Postgres, Redis, S3
Test duration: 30 minutes sustained, with 5-minute ramp-up
Budget: run tests on dedicated infrastructure to avoid affecting production
Must test both API performance and downstream service behavior (database, cache, storage)

Design

Traffic Pattern Modeling

I analyzed 7 days of production logs to extract realistic traffic patterns:

Metric	Production Value
Peak requests/min	45,000
Average session duration	4.2 minutes
Requests per session	12-18
App open burst (first 3 seconds)	5-7 requests
Background sync interval	Every 30 seconds
Concurrent active sessions at peak	8,000

Scenario Design

Instead of a flat request rate, I designed user journey scenarios:

Scenario 1: App Open (40% of traffic)

1. POST /auth/refresh (token refresh)
2. GET /user/profile
3. GET /feed?limit=20 (parallel)
4. GET /notifications/unread (parallel)
5. GET /config/remote (parallel)
-- 3-second burst, then idle

Scenario 2: Browse and Interact (35% of traffic)

1. GET /feed?limit=20&offset=N (scroll, every 2-3 seconds)
2. POST /events/track (analytics, per scroll)
3. GET /content/:id (tap on item)
4. POST /content/:id/like (occasional)
-- 2-4 minute session

Scenario 3: Background Sync (25% of traffic)

1. POST /sync/push (upload local changes)
2. GET /sync/pull?since=TIMESTAMP
-- Every 30 seconds while app is backgrounded

k6 Implementation

export const options = {
  scenarios: {
    app_open: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '5m', target: 3200 },  // 40% of 8,000
        { duration: '25m', target: 3200 },
        { duration: '2m', target: 0 },
      ],
      exec: 'appOpenScenario',
    },
    browse: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '5m', target: 2800 },  // 35% of 8,000
        { duration: '25m', target: 2800 },
        { duration: '2m', target: 0 },
      ],
      exec: 'browseScenario',
    },
    background_sync: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '5m', target: 2000 },  // 25% of 8,000
        { duration: '25m', target: 2000 },
        { duration: '2m', target: 0 },
      ],
      exec: 'backgroundSyncScenario',
    },
  },
};

Network Condition Simulation

Mobile clients operate on varying networks. I distributed virtual users across simulated conditions:

Network	% of VUs	Simulated RTT	Bandwidth
4G	50%	50ms	10Mbps
3G	30%	150ms	1.5Mbps
Poor 3G	15%	300ms	400Kbps
Edge/2G	5%	500ms	100Kbps

Slow clients hold connections longer, which directly impacts connection pool utilization on the server.

Trade-offs

Load Test Design Approaches

Approach	Realism	Setup Effort	Insight Quality
Flat rate, single endpoint	Low	Low	Finds basic throughput ceiling
Flat rate, mixed endpoints	Medium	Low	Finds per-endpoint bottlenecks
Scenario-based, uniform network	High	Medium	Finds workflow bottlenecks
Scenario-based, mixed network (this)	Very high	High	Finds production-realistic bottlenecks

Results: What We Found

Bottleneck	Discovered At	Cause
Auth token refresh storm	2x load	3,200 concurrent token refreshes on "app open" saturated the auth service
Connection pool exhaustion	1.5x load	Slow clients (poor 3G) holding connections for 2-3 seconds consumed all pool slots
Feed query timeout	1.8x load	Feed aggregation query exceeded 5s under concurrent load
Redis memory spike	1.7x load	Session cache growing faster than TTL eviction during ramp-up
S3 request throttling	2x load	Image URL generation hitting S3 API rate limits

Three of these five bottlenecks would not have been found with a flat-rate, single-endpoint load test.

Bottleneck Resolution

Bottleneck	Fix	Impact
Auth refresh storm	Stagger token expiration (jitter)	Spread 3,200 refreshes over 30 seconds
Connection pool	Increase pool size + add PgBouncer	Handled 2x slow clients
Feed query	Add query-level caching (60s TTL)	Reduced DB load by 85%
Redis memory	Reduce session TTL from 24h to 4h	Cut peak memory by 60%
S3 throttling	Pre-sign URLs in batch, cache for 1h	Eliminated S3 API calls per request

Failure Modes

Load test infrastructure itself becoming the bottleneck: k6 running on undersized instances can max out CPU before generating target load. Monitor k6 process metrics (CPU, memory, network) alongside target metrics. I used 4 k6 instances on c5.2xlarge to generate 8,000 VUs.

Unrealistic data distribution: If all VUs request the same feed items or user profiles, caching makes the test artificially easy. Use a realistic data distribution (Zipfian for content, uniform for user IDs) to stress both cache and database.

Missing warm-up effects: Production databases have warm buffer caches. A fresh test environment has cold caches, making initial results pessimistic. Run a 5-minute warm-up at 50% load before measuring.

Network simulation limitations: k6 does not natively simulate variable network conditions. I used a proxy layer (toxiproxy) between k6 and the API to add latency and bandwidth limits. This adds infrastructure complexity but produces more realistic results.

Scaling Considerations

Scale load test infrastructure proportionally. A rule of thumb: 1 k6 instance per 2,000 VUs for HTTP tests.
Run load tests in the same region as the production environment to minimize test infrastructure network noise.
Store load test results (k6 outputs to InfluxDB, Prometheus, or JSON) for historical comparison across releases.
Automate load tests in CI for release candidates. A performance regression gate prevents deploying slower code.

Observability

During load tests, monitor:

API layer: request rate, error rate, p50/p95/p99 latency per endpoint, active connections
Database: active connections, query duration, lock waits, replication lag
Cache: hit rate, memory usage, eviction rate, connection count
Infrastructure: CPU, memory, network I/O, disk I/O per service
k6 metrics: VU count, iteration rate, request rate, failed requests, data sent/received

The most useful view: a dashboard overlaying k6 VU count with API p95 latency and database connection count. This shows exactly when and why latency degrades.

Key Takeaways

Model load tests as user journeys, not independent requests. Mobile users follow predictable session patterns that create correlated load.
Simulate network conditions. Slow clients hold server resources longer, which is the primary difference between mobile and web load patterns.
The "app open" burst is the most dangerous traffic pattern. Thousands of users opening the app simultaneously (after a push notification, for example) creates a thundering herd on auth and feed endpoints.
Three of five bottlenecks found were invisible to flat-rate testing. Scenario-based testing with mixed network conditions is worth the setup effort.
Run load tests regularly, not just before launches. Performance regressions are easier to fix when caught early.

Final Thoughts

After fixing the five bottlenecks identified in load testing, the backend handled 2.2x current load with p95 latency under 300ms. The launch proceeded without incident, traffic peaked at 1.8x, and no new bottlenecks emerged. The total load testing effort was 3 days: 1 day for scenario design and implementation, 1 day for test execution and analysis, and 1 day for fixes. That investment prevented what would have been a degraded launch experience for 80,000 users.