Designing APIs With Mobile Constraints in Mind

Dhruval Dhameliya·August 10, 2025·7 min read

How to design backend APIs that account for mobile-specific constraints: bandwidth, latency, battery, intermittent connectivity, and long-lived client versions.

APIs designed for web frontends often perform poorly on mobile. They assume fast, reliable connections, unlimited bandwidth, and clients that can be updated instantly. None of these hold for mobile. This post covers how to design APIs that work well under mobile constraints from the start.

Context

Mobile clients differ from web clients in fundamental ways: they operate on variable-quality networks, they consume battery with every network request, they cannot be force-updated, and they have limited memory for caching large responses. An API that ignores these constraints forces the mobile team to build complex workarounds on the client.

Problem

Design APIs that:

  • Minimize bandwidth usage and request count
  • Tolerate high latency and intermittent connectivity
  • Support long-lived client versions without breaking changes
  • Return data in shapes that match mobile UI requirements

Constraints

ConstraintDetail
BandwidthMetered connections, 50 Kbps to 100 Mbps range
Latency50ms (WiFi) to 5000ms (2G) RTT
BatteryEach request activates the radio, costing battery even for small payloads
Client versions20+ app versions in production simultaneously
MemoryResponse must be parseable without loading entire payload into memory on low-end devices

Design

Response Design: Aggregate, Do Not Scatter

The single worst pattern for mobile is requiring multiple sequential API calls to render one screen.

Bad: N+1 calls to render a profile

GET /users/123           -> user data
GET /users/123/posts     -> post list
GET /users/123/followers -> follower count
GET /users/123/badges    -> achievement badges

Good: Single aggregated endpoint

GET /users/123/profile

{
    "user": { ... },
    "recent_posts": [ ... ],
    "follower_count": 1234,
    "badges": [ ... ]
}

This is the Backend for Frontend (BFF) pattern. A mobile-specific BFF aggregates data from multiple services into a single response shaped for the screen.

Field Selection

Let clients request only the fields they need:

GET /users/123/profile?fields=user.name,user.avatar_url,follower_count

This reduces payload size by 50-80% for screens that only need a subset of the data.

field_selection(full_response, requested_fields):
    if requested_fields is empty:
        return full_response  // Default: return everything

    filtered = {}
    for field_path in requested_fields:
        value = extract_nested(full_response, field_path)
        set_nested(filtered, field_path, value)
    return filtered

Pagination

Mobile pagination must be cursor-based, not offset-based:

ApproachProblem on Mobile
Offset (?page=3&limit=20)Insertions cause duplicates or skipped items when paginating
Cursor (?after=abc123&limit=20)Stable regardless of insertions or deletions
GET /feed?after=cursor_abc123&limit=20

Response:
{
    "items": [...],
    "cursors": {
        "next": "cursor_def456",
        "has_more": true
    }
}

Compression and Payload Optimization

TechniqueBandwidth SavingsTrade-off
Gzip response compression60-80%CPU on client for decompression
JSON field shortening (e.g., fn instead of first_name)10-20%Readability loss, debugging difficulty
Protocol Buffers30-50% over JSONSchema management, less human-readable
Delta responses (send only changes)70-90% for updatesComplex client-side merge logic

Recommendation: Use gzip compression on all responses (essentially free). Consider Protocol Buffers for high-frequency endpoints. Avoid field shortening; the readability cost outweighs the bandwidth savings.

Conditional Requests

Avoid re-downloading unchanged data:

Client sends: GET /feed
              If-None-Match: "etag_abc123"

Server responds: 304 Not Modified (empty body)

For collections that change partially:

Client sends: GET /feed
              If-Modified-Since: 2025-10-26T12:00:00Z

Server responds: 200 OK
{
    "items": [/* only items modified since the given timestamp */],
    "deleted_ids": ["id1", "id2"],
    "last_modified": "2025-10-27T08:30:00Z"
}

Error Response Design

Mobile error responses must be actionable by the client, not just human-readable:

{
    "error": {
        "code": "INSUFFICIENT_FUNDS",
        "message": "Your account balance is too low for this purchase.",
        "details": {
            "required_amount": 29.99,
            "available_amount": 15.50,
            "currency": "USD"
        },
        "actions": [
            {
                "type": "deep_link",
                "label": "Add funds",
                "uri": "app://wallet/add-funds"
            }
        ],
        "retry_eligible": false
    }
}

The code field enables client-side branching. The actions field allows the server to guide the user without requiring a client update.

Request Batching

For clients that need to make many small requests, provide a batch endpoint:

POST /batch

{
    "requests": [
        {"method": "GET", "path": "/users/123"},
        {"method": "GET", "path": "/users/456"},
        {"method": "GET", "path": "/products/789"}
    ]
}

Response:
{
    "responses": [
        {"status": 200, "body": {...}},
        {"status": 200, "body": {...}},
        {"status": 404, "body": {...}}
    ]
}

This turns N network round trips into one, saving battery and reducing total latency.

Timeouts and Deadlines

APIs should accept a client-specified deadline:

GET /search?q=shoes
X-Request-Deadline: 3000  // milliseconds

The server uses this to short-circuit expensive operations. If the search cannot complete in 3 seconds, return partial results rather than timing out.

Related: Comparing Search Implementations: Client vs Server.

handle_search(query, deadline_ms):
    start = now()
    results = []

    for shard in search_shards:
        if (now() - start) > deadline_ms * 0.8:
            break  // Return what we have
        results.extend(shard.search(query))

    return SearchResponse(
        results = results,
        is_partial = len(results) < expected_total,
        total_available = expected_total
    )

Trade-offs

DecisionUpsideDownside
BFF patternOptimized responses per screenAnother service to maintain
Field selectionSmaller payloadsCaching complexity (different field sets = different cache keys)
Cursor paginationStable under concurrent writesCannot jump to arbitrary page
Batch endpointFewer round tripsComplex error handling per sub-request
Client deadline headerServer respects client timeoutServer must support partial responses

Failure Modes

  • BFF becomes a bottleneck: All mobile traffic funnels through one service. Mitigation: deploy BFF per feature team, not as a monolith.
  • Field selection cache miss storm: Every unique field combination is a separate cache entry. Mitigation: define standard field sets ("minimal", "standard", "full") and cache those.
  • Batch request partial failure: 2 of 3 sub-requests succeed. Client must handle per-request status codes within the batch response.
  • Stale ETag after server-side data migration: ETags computed from old data format invalidate all client caches simultaneously. Mitigation: version ETags separately from data format.
  • Overly aggressive compression: A highly compressed response on a low-end device causes high CPU usage during decompression. Mitigation: use gzip level 6 (balanced), not level 9 (maximum).

See also: Building a Minimal Feature Flag Service.

Scaling Considerations

  • BFF services should be stateless and horizontally scalable. They are pure aggregation layers.
  • Cache aggregated responses at the BFF level with short TTLs (30-60 seconds). This absorbs repeated requests for the same screen.
  • For global deployment, BFF instances should be co-located with backend services to minimize inter-service latency.

Observability

  • Track: response size distribution, cache hit rate for conditional requests (304 vs 200), field selection usage patterns, batch endpoint usage, client deadline misses.
  • Alert on: average response size exceeding target per endpoint, 304 rate dropping (indicates caching issues), batch endpoint error rate.
  • Log: field selection patterns to identify common sets worth standardizing.

Key Takeaways

  • Design one-request-per-screen endpoints. Multiple sequential calls are the primary cause of slow mobile experiences.
  • Support conditional requests (ETags, If-Modified-Since) on every endpoint that returns data the client caches.
  • Use cursor-based pagination. Offset pagination breaks under concurrent writes, which are common in feed-style UIs.
  • Make error responses machine-actionable. Include error codes, structured details, and suggested actions.
  • Accept client deadlines. A partial response in 3 seconds is more useful than a complete response in 15 seconds.

Further Reading

Final Thoughts

A mobile-friendly API is not a dumbed-down API. It is an API that respects the physical constraints of the device and network it serves. Every byte over the wire costs battery and data plan. Every extra round trip costs seconds the user will not wait. Design accordingly.

Recommended