How I'd Design a Mobile Configuration System at Scale

Dhruval Dhameliya·November 2, 2025·7 min read

Designing a configuration system for mobile apps at scale, covering config delivery, caching layers, override hierarchies, and safe rollout of config changes.

Configuration systems control timeouts, feature thresholds, UI copy, endpoint URLs, and hundreds of other runtime parameters. At scale, a config change can affect millions of devices simultaneously. This post covers how to design a system that makes config changes safe, fast, and observable.

Context

A mobile configuration system delivers key-value pairs (or structured config objects) to devices at runtime. It replaces hardcoded values with dynamically controllable ones. At scale (10M+ devices), the system must handle high read throughput, fast propagation of changes, safe rollout, and graceful failure.

Problem

Design a configuration system that:

  • Delivers config to millions of devices with low latency
  • Supports hierarchical overrides (global, platform, app version, user segment)
  • Rolls out config changes safely with canary and percentage-based releases
  • Fails safely when the config service is unreachable

Constraints

ConstraintDetail
Fetch latencyConfig available within 200ms of app cold start
PropagationConfig changes reach 95% of active devices within 15 minutes
Payload sizeFull config payload under 100KB compressed
ReliabilityMust function with no network, stale cache, or corrupted cache
ConsistencyConfig should not change mid-session (pin per session)

Design

Config Data Model

ConfigEntry {
    key: String           // "api_timeout_ms", "max_retry_count"
    value: Any            // 5000, 3
    type: ValueType       // INT, STRING, BOOLEAN, JSON
    metadata: {
        description: String
        owner: String     // Team or individual
        last_modified: Timestamp
        version: Int
    }
    overrides: List<Override>
}

Override {
    condition: Condition   // platform=android, app_version>=5.0, country=IN
    value: Any
    priority: Int          // Higher priority overrides win
}

Override Hierarchy

Overrides are evaluated in priority order. The most specific matching override wins:

PriorityLevelExample
1 (lowest)Global defaultapi_timeout_ms = 5000
2PlatformAndroid: api_timeout_ms = 7000
3App version rangev5.0-5.2: api_timeout_ms = 10000
4Country/RegionIndia: api_timeout_ms = 15000
5User segmentBeta users: api_timeout_ms = 3000
6 (highest)Individual userUser 12345: api_timeout_ms = 2000
class ConfigResolver(private val context: DeviceContext) {
    fun resolve(entry: ConfigEntry): Any {
        val applicableOverrides = entry.overrides
            .filter { it.condition.matches(context) }
            .sortedByDescending { it.priority }
 
        return applicableOverrides.firstOrNull()?.value ?: entry.value
    }
}

Server Architecture

Admin UI -> Config Service -> PostgreSQL (source of truth)
                |
                v
            Config Compiler -> CDN (compiled config JSON per platform)
                |
            Mobile Client

The Config Compiler runs on every config change. It evaluates all entries and overrides, produces platform-specific JSON payloads, and pushes them to the CDN. The client never evaluates overrides at runtime (for simple cases). For user-segment or individual overrides, the client receives the override rules and evaluates locally.

Client-Side Architecture

class ConfigManager(
    private val diskCache: ConfigDiskCache,
    private val fetcher: ConfigFetcher,
    private val defaults: Map<String, Any>
) {
    private var sessionConfig: Map<String, Any>? = null
 
    suspend fun initialize() {
        // 1. Load from disk cache (fast, survives process death)
        val cachedConfig = diskCache.load()
 
        // 2. Pin for this session
        sessionConfig = cachedConfig ?: defaults
 
        // 3. Fetch latest in background (for next session)
        fetchAndCacheInBackground()
    }
 
    fun getString(key: String, default: String): String {
        return (sessionConfig?.get(key) as? String) ?: defaults[key] as? String ?: default
    }
 
    fun getInt(key: String, default: Int): Int {
        return (sessionConfig?.get(key) as? Int) ?: defaults[key] as? Int ?: default
    }
 
    private suspend fun fetchAndCacheInBackground() {
        try {
            val latest = fetcher.fetch()
            diskCache.save(latest)
            // Will be used in next session
        } catch (e: Exception) {
            // Silently fail; current session uses cached config
        }
    }
}

Safe Rollout

See also: Designing Background Job Systems for Mobile Apps.

Config changes are rolled out progressively:

  1. Internal: Deploy to internal employees only (user segment override).
  2. Canary (1%): Deploy to 1% of users via percentage-based targeting.
  3. Gradual ramp: 5% -> 25% -> 50% -> 100%, with 24 hours between each stage.
  4. Full rollout: Remove targeting, make the new value the global default.
rollout_config_change(key, new_value, stages):
    for stage in stages:
        apply_override(key, new_value, targeting=stage.targeting)
        invalidate_cdn_cache()
        wait(stage.bake_time)
        check_guardrails(key)
        if guardrails_breached:
            rollback(key)
            alert_owner(key)
            return ROLLED_BACK
    promote_to_default(key, new_value)
    return SUCCESS

Validation

Every config change passes through validation before deployment:

  • Type check: New value matches declared type.
  • Range check: Numeric values within declared min/max bounds.
  • Dependency check: If config A depends on config B, validate consistency.
  • Diff review: Changes to critical configs (endpoint URLs, auth params) require two-person approval.

Trade-offs

DecisionUpsideDownside
CDN deliveryLow latency, high availability, scales to any device countPropagation delay (CDN TTL)
Session pinningConsistent behavior within a sessionUrgent changes delayed until next session
Server-side compilationSimple client logicCompiler must run on every change, adds latency to admin workflow
Hierarchical overridesFlexible targetingComplex evaluation, harder to reason about effective value
Percentage-based rolloutSafe, progressiveSlower time-to-full-rollout

Failure Modes

  • CDN outage: Client falls back to disk cache. If disk cache is corrupted, falls back to compiled-in defaults.
  • Config compiler crash: CDN continues serving the last successfully compiled config. Alert the on-call team.
  • Invalid config deployed: A string value set for an integer config causes parse errors. Mitigation: type validation at write time, and client-side type coercion with fallback to default.
  • Config key collision: Two teams use the same key for different purposes. Mitigation: namespace keys by team (e.g., payments.api_timeout_ms).
  • Stale disk cache: App installed months ago has ancient config. Mitigation: include a min_config_version check; if the cached version is too old, block on a network fetch with a timeout before falling back to defaults.

Scaling Considerations

  • Payload size: At 1,000+ config keys, the payload exceeds 100KB. Split into config groups fetched on demand. Core configs fetched at startup, feature-specific configs fetched when the feature is accessed.
  • CDN invalidation at scale: Invalidating CDN cache across all edge nodes takes seconds to minutes. For urgent changes (kill switches), maintain a lightweight sidecar endpoint that bypasses CDN.
  • Multi-region: Config service deployed per region. Changes propagate via async replication. Accept eventual consistency across regions (config changes take up to 5 minutes to propagate globally).

Related: Designing Event Schemas That Survive Product Changes.

Observability

  • Track: config fetch success rate, cache hit rate, config version distribution across devices, time-to-propagation for each config change.
  • Alert on: fetch failure rate exceeding 5%, config version lagging by more than 2 versions for more than 10% of devices, critical config rollback triggered.
  • Audit log: every config change with who, what, when, and the rollout stage.

Key Takeaways

  • Session-pin config values. Mid-session changes cause inconsistent behavior and are nearly impossible to debug.
  • Use hierarchical overrides for flexibility, but namespace keys and document ownership to prevent chaos.
  • Roll out config changes progressively with guardrail checks between stages. A bad config change is as dangerous as a bad code deploy.
  • Always have three fallback layers: network fetch, disk cache, compiled-in defaults. The app must function even if the config system is completely unreachable.
  • Validate every config change before deployment. Type checks, range checks, and dependency checks catch the majority of config-related incidents.

Further Reading

Final Thoughts

A configuration system is a remote control for your application. It is powerful and dangerous in equal measure. The guardrails around how config changes are proposed, validated, rolled out, and monitored matter more than the delivery mechanism itself.

Recommended