Building a Minimal Feature Flag Service

Context

I needed feature flags for a product serving 50,000 daily active users. Requirements: boolean flags, percentage-based rollouts, user-level targeting, and sub-millisecond evaluation latency. Third-party services (LaunchDarkly, Statsig) cost $200+/month at this scale. I built a minimal alternative.

Problem

Feature flags seem simple (check a boolean), but the complexity surfaces in rollout percentages (deterministic assignment), targeting rules (user attributes), cache invalidation (flag changes must propagate quickly), and auditability (who changed what, when).

Constraints

Evaluation latency: under 1ms per flag check (flags are checked on every request)
Flag update propagation: under 30 seconds from change to full propagation
Storage: Postgres (existing infrastructure)
No additional services (no Redis, no dedicated flag server)
Must support at least 100 flags with 50,000 DAU
Audit trail for all flag changes

Design

Schema

CREATE TABLE feature_flags (
  id SERIAL PRIMARY KEY,
  key TEXT UNIQUE NOT NULL,
  enabled BOOLEAN NOT NULL DEFAULT false,
  rollout_percentage INTEGER NOT NULL DEFAULT 100
    CHECK (rollout_percentage BETWEEN 0 AND 100),
  targeting_rules JSONB DEFAULT '[]',
  description TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
 
CREATE TABLE flag_audit_log (
  id BIGSERIAL PRIMARY KEY,
  flag_key TEXT NOT NULL,
  action TEXT NOT NULL,
  old_value JSONB,
  new_value JSONB,
  changed_by TEXT NOT NULL,
  changed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Targeting Rules Schema

[
  {
    "attribute": "user_id",
    "operator": "in",
    "values": ["user_123", "user_456"]
  },
  {
    "attribute": "country",
    "operator": "equals",
    "values": ["US"]
  }
]

Rules are evaluated with AND logic. If any rule does not match, the flag is off for that user.

Evaluation Logic

function evaluateFlag(
  flag: FeatureFlag,
  context: { userId: string; attributes: Record<string, string> }
): boolean {
  if (!flag.enabled) return false;
 
  // Check targeting rules
  for (const rule of flag.targetingRules) {
    const value = rule.attribute === 'user_id'
      ? context.userId
      : context.attributes[rule.attribute];
 
    if (rule.operator === 'in' && !rule.values.includes(value)) return false;
    if (rule.operator === 'equals' && !rule.values.includes(value)) return false;
    if (rule.operator === 'not_in' && rule.values.includes(value)) return false;
  }
 
  // Percentage rollout (deterministic by user ID)
  if (flag.rolloutPercentage < 100) {
    const hash = crc32(context.userId + flag.key);
    const bucket = hash % 100;
    return bucket < flag.rolloutPercentage;
  }
 
  return true;
}

The CRC32 hash of userId + flagKey ensures deterministic assignment: the same user always gets the same flag value for a given flag. Different flags use different hash inputs, so rollout populations are independent.

Caching Strategy

Flags are loaded from Postgres into an in-memory cache on application startup. A background polling loop refreshes the cache every 15 seconds.

class FlagCache {
  private flags: Map<string, FeatureFlag> = new Map();
  private pollInterval: NodeJS.Timer;
 
  async start() {
    await this.refresh();
    this.pollInterval = setInterval(() => this.refresh(), 15_000);
  }
 
  private async refresh() {
    const rows = await db.query('SELECT * FROM feature_flags');
    const newFlags = new Map();
    for (const row of rows) {
      newFlags.set(row.key, row);
    }
    this.flags = newFlags; // Atomic swap
  }
 
  evaluate(key: string, context: EvalContext): boolean {
    const flag = this.flags.get(key);
    if (!flag) return false; // Unknown flags default to off
    return evaluateFlag(flag, context);
  }
}

The atomic swap (replacing the entire Map reference) avoids read-write races. Readers always see a consistent snapshot of all flags.

Trade-offs

Aspect	This Design	LaunchDarkly	Environment Variables
Evaluation latency	under 0.1ms (in-memory)	~1ms (SDK cache)	under 0.01ms
Update propagation	15-30s (polling)	under 1s (streaming)	Requires redeploy
Percentage rollouts	Yes	Yes	No
User targeting	Basic (attributes)	Advanced (segments)	No
Audit trail	Yes (DB table)	Yes (built-in)	Git history only
Cost	$0 (existing DB)	$200+/month	$0
Operational overhead	Medium	Low	None

The main gap compared to LaunchDarkly is propagation speed. Polling every 15 seconds means flag changes take up to 30 seconds to propagate (worst case: change happens right after a poll). For kill switches during incidents, this delay is acceptable. For A/B test changes, it is fine.

Failure Modes

Database unavailable during cache refresh: If Postgres is down when the polling loop runs, the cache retains its last known state. Flags continue to evaluate correctly using stale data. The refresh should log the failure and retry on the next interval. It should not clear the cache.

Flag key typo in code: Evaluating a non-existent flag key returns false (default off). This is safe for feature gates but dangerous for flags that control critical paths. Mitigation: a build-time check that compares flag keys in code against the database.

Hash collision in percentage rollout: CRC32 produces 32-bit hashes. With 50,000 users and 100 buckets, the distribution is effectively uniform. But for very small rollout percentages (1%), the actual population may deviate by 10-20% from the target. For a 1% rollout of 50,000 users, that means 400-600 users instead of exactly 500. This is acceptable for feature rollouts but not for precise A/B testing.

Cache inconsistency across instances: If you run 10 application instances, each polling independently, there is a window where some instances have the new flag value and others do not. Maximum inconsistency window: one poll interval (15 seconds). For most flag changes, this is acceptable.

Scaling Considerations

The polling query (SELECT * FROM feature_flags) returns all flags. At 100 flags, this is negligible. At 10,000 flags, consider polling only flags that changed since the last poll (using updated_at > $lastPollTimestamp).
For multi-region deployments, each region should have its own cache. Flag changes propagate through the database, which handles replication.
If evaluation latency becomes a concern (it will not at 100 flags), pre-compute flag evaluations per user segment and cache the results.
Replace polling with Postgres LISTEN/NOTIFY for faster propagation. A trigger on the feature_flags table can notify all connected application instances within milliseconds.

Observability

Log every flag evaluation result to an analytics pipeline (sampled at 1% to avoid volume issues)
Track flag evaluation distribution: for a 50% rollout, verify that approximately 50% of users see the feature
Alert on flag refresh failures (database unreachable)
Dashboard showing active flags, rollout percentages, and last-changed timestamps
Audit log queries for incident review: "what flags changed in the last hour?"

Key Takeaways

In-memory evaluation with database-backed storage is the simplest architecture that supports percentage rollouts and targeting.
Deterministic hashing (user ID + flag key) is essential for consistent user experiences across requests.
Atomic cache swap eliminates the need for read-write locks.
Unknown flag keys should default to false (off). This is a safety decision, not a convenience one.
The cost gap between a custom solution and a SaaS product closes when you factor in maintenance time. Build this only if you need fewer than 200 flags and can accept 15-30 second propagation delays.

Final Thoughts

This service runs in production with 85 flags, handles 50,000 DAU, and has added zero infrastructure cost. The entire implementation is under 300 lines of TypeScript. The main operational overhead is the audit log review during incidents ("did someone change a flag?"), which pays for itself the first time it answers that question in 10 seconds instead of 30 minutes.