Load Testing Mobile Backends With Realistic Traffic

Dhruval Dhameliya·July 29, 2025·8 min read

Designing load tests that replicate mobile traffic patterns including bursty connections, mixed network conditions, and session-based workflows.

Context

A mobile backend serving 80,000 DAU needed load testing before a product launch expected to double traffic. Standard load tests (uniform request rate, single endpoint) did not reveal the bottlenecks that appeared in production. I redesigned the load testing approach to replicate mobile-specific traffic patterns.

See also: Mobile Analytics Pipeline: From App Event to Dashboard.

Problem

Mobile traffic differs from web traffic in critical ways: request bursts on app open, session-based workflows (not independent requests), variable payload sizes, and mixed network conditions causing connection pooling issues. Load tests that ignore these patterns produce misleading results.

Constraints

  • Load testing tool: k6 (supports scenarios, custom protocols, and browser-like behavior)
  • Target: 160,000 simulated DAU (2x current)
  • Backend: Node.js API on AWS ECS, Postgres, Redis, S3
  • Test duration: 30 minutes sustained, with 5-minute ramp-up
  • Budget: run tests on dedicated infrastructure to avoid affecting production
  • Must test both API performance and downstream service behavior (database, cache, storage)

Related: Debugging Performance Issues in Large Android Apps.

Design

Traffic Pattern Modeling

I analyzed 7 days of production logs to extract realistic traffic patterns:

MetricProduction Value
Peak requests/min45,000
Average session duration4.2 minutes
Requests per session12-18
App open burst (first 3 seconds)5-7 requests
Background sync intervalEvery 30 seconds
Concurrent active sessions at peak8,000

Scenario Design

Instead of a flat request rate, I designed user journey scenarios:

Scenario 1: App Open (40% of traffic)

1. POST /auth/refresh (token refresh)
2. GET /user/profile
3. GET /feed?limit=20 (parallel)
4. GET /notifications/unread (parallel)
5. GET /config/remote (parallel)
-- 3-second burst, then idle

Scenario 2: Browse and Interact (35% of traffic)

1. GET /feed?limit=20&offset=N (scroll, every 2-3 seconds)
2. POST /events/track (analytics, per scroll)
3. GET /content/:id (tap on item)
4. POST /content/:id/like (occasional)
-- 2-4 minute session

Scenario 3: Background Sync (25% of traffic)

1. POST /sync/push (upload local changes)
2. GET /sync/pull?since=TIMESTAMP
-- Every 30 seconds while app is backgrounded

k6 Implementation

export const options = {
  scenarios: {
    app_open: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '5m', target: 3200 },  // 40% of 8,000
        { duration: '25m', target: 3200 },
        { duration: '2m', target: 0 },
      ],
      exec: 'appOpenScenario',
    },
    browse: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '5m', target: 2800 },  // 35% of 8,000
        { duration: '25m', target: 2800 },
        { duration: '2m', target: 0 },
      ],
      exec: 'browseScenario',
    },
    background_sync: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '5m', target: 2000 },  // 25% of 8,000
        { duration: '25m', target: 2000 },
        { duration: '2m', target: 0 },
      ],
      exec: 'backgroundSyncScenario',
    },
  },
};

Network Condition Simulation

Mobile clients operate on varying networks. I distributed virtual users across simulated conditions:

Network% of VUsSimulated RTTBandwidth
4G50%50ms10Mbps
3G30%150ms1.5Mbps
Poor 3G15%300ms400Kbps
Edge/2G5%500ms100Kbps

Slow clients hold connections longer, which directly impacts connection pool utilization on the server.

Trade-offs

Load Test Design Approaches

ApproachRealismSetup EffortInsight Quality
Flat rate, single endpointLowLowFinds basic throughput ceiling
Flat rate, mixed endpointsMediumLowFinds per-endpoint bottlenecks
Scenario-based, uniform networkHighMediumFinds workflow bottlenecks
Scenario-based, mixed network (this)Very highHighFinds production-realistic bottlenecks

Results: What We Found

BottleneckDiscovered AtCause
Auth token refresh storm2x load3,200 concurrent token refreshes on "app open" saturated the auth service
Connection pool exhaustion1.5x loadSlow clients (poor 3G) holding connections for 2-3 seconds consumed all pool slots
Feed query timeout1.8x loadFeed aggregation query exceeded 5s under concurrent load
Redis memory spike1.7x loadSession cache growing faster than TTL eviction during ramp-up
S3 request throttling2x loadImage URL generation hitting S3 API rate limits

Three of these five bottlenecks would not have been found with a flat-rate, single-endpoint load test.

Bottleneck Resolution

BottleneckFixImpact
Auth refresh stormStagger token expiration (jitter)Spread 3,200 refreshes over 30 seconds
Connection poolIncrease pool size + add PgBouncerHandled 2x slow clients
Feed queryAdd query-level caching (60s TTL)Reduced DB load by 85%
Redis memoryReduce session TTL from 24h to 4hCut peak memory by 60%
S3 throttlingPre-sign URLs in batch, cache for 1hEliminated S3 API calls per request

Failure Modes

Load test infrastructure itself becoming the bottleneck: k6 running on undersized instances can max out CPU before generating target load. Monitor k6 process metrics (CPU, memory, network) alongside target metrics. I used 4 k6 instances on c5.2xlarge to generate 8,000 VUs.

Unrealistic data distribution: If all VUs request the same feed items or user profiles, caching makes the test artificially easy. Use a realistic data distribution (Zipfian for content, uniform for user IDs) to stress both cache and database.

Missing warm-up effects: Production databases have warm buffer caches. A fresh test environment has cold caches, making initial results pessimistic. Run a 5-minute warm-up at 50% load before measuring.

Network simulation limitations: k6 does not natively simulate variable network conditions. I used a proxy layer (toxiproxy) between k6 and the API to add latency and bandwidth limits. This adds infrastructure complexity but produces more realistic results.

Scaling Considerations

  • Scale load test infrastructure proportionally. A rule of thumb: 1 k6 instance per 2,000 VUs for HTTP tests.
  • Run load tests in the same region as the production environment to minimize test infrastructure network noise.
  • Store load test results (k6 outputs to InfluxDB, Prometheus, or JSON) for historical comparison across releases.
  • Automate load tests in CI for release candidates. A performance regression gate prevents deploying slower code.

Observability

During load tests, monitor:

  • API layer: request rate, error rate, p50/p95/p99 latency per endpoint, active connections
  • Database: active connections, query duration, lock waits, replication lag
  • Cache: hit rate, memory usage, eviction rate, connection count
  • Infrastructure: CPU, memory, network I/O, disk I/O per service
  • k6 metrics: VU count, iteration rate, request rate, failed requests, data sent/received

The most useful view: a dashboard overlaying k6 VU count with API p95 latency and database connection count. This shows exactly when and why latency degrades.

Key Takeaways

  • Model load tests as user journeys, not independent requests. Mobile users follow predictable session patterns that create correlated load.
  • Simulate network conditions. Slow clients hold server resources longer, which is the primary difference between mobile and web load patterns.
  • The "app open" burst is the most dangerous traffic pattern. Thousands of users opening the app simultaneously (after a push notification, for example) creates a thundering herd on auth and feed endpoints.
  • Three of five bottlenecks found were invisible to flat-rate testing. Scenario-based testing with mixed network conditions is worth the setup effort.
  • Run load tests regularly, not just before launches. Performance regressions are easier to fix when caught early.

Further Reading

Final Thoughts

After fixing the five bottlenecks identified in load testing, the backend handled 2.2x current load with p95 latency under 300ms. The launch proceeded without incident, traffic peaked at 1.8x, and no new bottlenecks emerged. The total load testing effort was 3 days: 1 day for scenario design and implementation, 1 day for test execution and analysis, and 1 day for fixes. That investment prevented what would have been a degraded launch experience for 80,000 users.

Recommended