Building a Minimal Feature Flag Service
Designing a feature flag system with percentage rollouts, user targeting, and kill switches using Postgres and an in-memory cache.
Context
I needed feature flags for a product serving 50,000 daily active users. Requirements: boolean flags, percentage-based rollouts, user-level targeting, and sub-millisecond evaluation latency. Third-party services (LaunchDarkly, Statsig) cost $200+/month at this scale. I built a minimal alternative.
Problem
Feature flags seem simple (check a boolean), but the complexity surfaces in rollout percentages (deterministic assignment), targeting rules (user attributes), cache invalidation (flag changes must propagate quickly), and auditability (who changed what, when).
Constraints
- Evaluation latency: under 1ms per flag check (flags are checked on every request)
- Flag update propagation: under 30 seconds from change to full propagation
- Storage: Postgres (existing infrastructure)
- No additional services (no Redis, no dedicated flag server)
- Must support at least 100 flags with 50,000 DAU
- Audit trail for all flag changes
Design
Schema
CREATE TABLE feature_flags (
id SERIAL PRIMARY KEY,
key TEXT UNIQUE NOT NULL,
enabled BOOLEAN NOT NULL DEFAULT false,
rollout_percentage INTEGER NOT NULL DEFAULT 100
CHECK (rollout_percentage BETWEEN 0 AND 100),
targeting_rules JSONB DEFAULT '[]',
description TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE flag_audit_log (
id BIGSERIAL PRIMARY KEY,
flag_key TEXT NOT NULL,
action TEXT NOT NULL,
old_value JSONB,
new_value JSONB,
changed_by TEXT NOT NULL,
changed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);Targeting Rules Schema
[
{
"attribute": "user_id",
"operator": "in",
"values": ["user_123", "user_456"]
},
{
"attribute": "country",
"operator": "equals",
"values": ["US"]
}
]Rules are evaluated with AND logic. If any rule does not match, the flag is off for that user.
Evaluation Logic
function evaluateFlag(
flag: FeatureFlag,
context: { userId: string; attributes: Record<string, string> }
): boolean {
if (!flag.enabled) return false;
// Check targeting rules
for (const rule of flag.targetingRules) {
const value = rule.attribute === 'user_id'
? context.userId
: context.attributes[rule.attribute];
if (rule.operator === 'in' && !rule.values.includes(value)) return false;
if (rule.operator === 'equals' && !rule.values.includes(value)) return false;
if (rule.operator === 'not_in' && rule.values.includes(value)) return false;
}
// Percentage rollout (deterministic by user ID)
if (flag.rolloutPercentage < 100) {
const hash = crc32(context.userId + flag.key);
const bucket = hash % 100;
return bucket < flag.rolloutPercentage;
}
return true;
}The CRC32 hash of userId + flagKey ensures deterministic assignment: the same user always gets the same flag value for a given flag. Different flags use different hash inputs, so rollout populations are independent.
Caching Strategy
Flags are loaded from Postgres into an in-memory cache on application startup. A background polling loop refreshes the cache every 15 seconds.
class FlagCache {
private flags: Map<string, FeatureFlag> = new Map();
private pollInterval: NodeJS.Timer;
async start() {
await this.refresh();
this.pollInterval = setInterval(() => this.refresh(), 15_000);
}
private async refresh() {
const rows = await db.query('SELECT * FROM feature_flags');
const newFlags = new Map();
for (const row of rows) {
newFlags.set(row.key, row);
}
this.flags = newFlags; // Atomic swap
}
evaluate(key: string, context: EvalContext): boolean {
const flag = this.flags.get(key);
if (!flag) return false; // Unknown flags default to off
return evaluateFlag(flag, context);
}
}The atomic swap (replacing the entire Map reference) avoids read-write races. Readers always see a consistent snapshot of all flags.
Trade-offs
| Aspect | This Design | LaunchDarkly | Environment Variables |
|---|---|---|---|
| Evaluation latency | under 0.1ms (in-memory) | ~1ms (SDK cache) | under 0.01ms |
| Update propagation | 15-30s (polling) | under 1s (streaming) | Requires redeploy |
| Percentage rollouts | Yes | Yes | No |
| User targeting | Basic (attributes) | Advanced (segments) | No |
| Audit trail | Yes (DB table) | Yes (built-in) | Git history only |
| Cost | $0 (existing DB) | $200+/month | $0 |
| Operational overhead | Medium | Low | None |
The main gap compared to LaunchDarkly is propagation speed. Polling every 15 seconds means flag changes take up to 30 seconds to propagate (worst case: change happens right after a poll). For kill switches during incidents, this delay is acceptable. For A/B test changes, it is fine.
Failure Modes
See also: Failure Modes I Actively Design For.
Database unavailable during cache refresh: If Postgres is down when the polling loop runs, the cache retains its last known state. Flags continue to evaluate correctly using stale data. The refresh should log the failure and retry on the next interval. It should not clear the cache.
Flag key typo in code: Evaluating a non-existent flag key returns false (default off). This is safe for feature gates but dangerous for flags that control critical paths. Mitigation: a build-time check that compares flag keys in code against the database.
Hash collision in percentage rollout: CRC32 produces 32-bit hashes. With 50,000 users and 100 buckets, the distribution is effectively uniform. But for very small rollout percentages (1%), the actual population may deviate by 10-20% from the target. For a 1% rollout of 50,000 users, that means 400-600 users instead of exactly 500. This is acceptable for feature rollouts but not for precise A/B testing.
Cache inconsistency across instances: If you run 10 application instances, each polling independently, there is a window where some instances have the new flag value and others do not. Maximum inconsistency window: one poll interval (15 seconds). For most flag changes, this is acceptable.
Scaling Considerations
- The polling query (
SELECT * FROM feature_flags) returns all flags. At 100 flags, this is negligible. At 10,000 flags, consider polling only flags that changed since the last poll (usingupdated_at > $lastPollTimestamp). - For multi-region deployments, each region should have its own cache. Flag changes propagate through the database, which handles replication.
- If evaluation latency becomes a concern (it will not at 100 flags), pre-compute flag evaluations per user segment and cache the results.
- Replace polling with Postgres LISTEN/NOTIFY for faster propagation. A trigger on the
feature_flagstable can notify all connected application instances within milliseconds.
Observability
Related: Mobile Analytics Pipeline: From App Event to Dashboard.
- Log every flag evaluation result to an analytics pipeline (sampled at 1% to avoid volume issues)
- Track flag evaluation distribution: for a 50% rollout, verify that approximately 50% of users see the feature
- Alert on flag refresh failures (database unreachable)
- Dashboard showing active flags, rollout percentages, and last-changed timestamps
- Audit log queries for incident review: "what flags changed in the last hour?"
Key Takeaways
- In-memory evaluation with database-backed storage is the simplest architecture that supports percentage rollouts and targeting.
- Deterministic hashing (user ID + flag key) is essential for consistent user experiences across requests.
- Atomic cache swap eliminates the need for read-write locks.
- Unknown flag keys should default to
false(off). This is a safety decision, not a convenience one. - The cost gap between a custom solution and a SaaS product closes when you factor in maintenance time. Build this only if you need fewer than 200 flags and can accept 15-30 second propagation delays.
Further Reading
- Designing a Feature Flag and Remote Config System: Architecture and trade-offs for building a feature flag and remote configuration system that handles targeting, rollout, and consistency ...
- Designing an Experimentation Platform for Mobile Apps: System design for a mobile experimentation platform covering assignment, exposure tracking, metric collection, statistical analysis, and ...
- Building a Simple Search Index: Designing an inverted index from scratch with tokenization, ranking, and query parsing, then comparing it against Postgres full-text search.
Final Thoughts
This service runs in production with 85 flags, handles 50,000 DAU, and has added zero infrastructure cost. The entire implementation is under 300 lines of TypeScript. The main operational overhead is the audit log review during incidents ("did someone change a flag?"), which pays for itself the first time it answers that question in 10 seconds instead of 30 minutes.
Recommended
Event Tracking System Design for Android Applications
A systems-level breakdown of designing an event tracking system for Android, covering batching, schema enforcement, local persistence, and delivery guarantees.
Designing a Feature Flag and Remote Config System
Architecture and trade-offs for building a feature flag and remote configuration system that handles targeting, rollout, and consistency across mobile clients.
Mobile Analytics Pipeline: From App Event to Dashboard
End-to-end design of a mobile analytics pipeline covering ingestion, processing, storage, and querying, with emphasis on reliability and latency trade-offs.