How I'd Design a Scalable Notification System

Notification systems appear simple until you face duplicate sends, channel coordination, user preference management, and delivery tracking across millions of users. This post covers a design that handles these concerns without collapsing under scale.

Context

A notification system serves multiple internal teams (marketing, product, transactional alerts) and delivers across multiple channels (push, email, SMS, in-app). Each channel has different latency expectations, cost profiles, and delivery guarantees. The system must prioritize user experience while giving internal teams self-service capabilities.

Problem

Design a notification system that:

Supports push, email, SMS, and in-app channels
Respects user preferences and quiet hours
Prevents duplicate and excessive notifications
Scales to millions of recipients per campaign
Provides delivery tracking and analytics

Constraints

Constraint	Detail
Throughput	10M notifications/hour for campaign sends
Latency	Transactional notifications delivered within 30 seconds
Deduplication	No user receives the same notification twice within 24 hours
Rate limiting	Max 5 push notifications per user per day
Cost	SMS costs $0.01-0.05 per message; must optimize channel selection

Design

High-Level Architecture

Notification Request -> API Gateway -> Notification Service
                                            |
                                    Priority Queue (Kafka)
                                            |
                                    Orchestrator Service
                                     /      |       \
                              Push Worker  Email Worker  SMS Worker
                                     \      |       /
                                    Delivery Tracker

Data Model

NotificationRequest {
    id: UUID
    recipient_id: String
    template_id: String
    channel: List<Channel>  // PUSH, EMAIL, SMS, IN_APP
    priority: Priority      // CRITICAL, HIGH, NORMAL, LOW
    payload: Map<String, Any>
    scheduled_at: Timestamp?
    idempotency_key: String
}

UserPreferences {
    user_id: String
    channels_enabled: Set<Channel>
    quiet_hours: TimeRange?
    frequency_cap: Map<Channel, Int>  // max per day
    timezone: String
}

Request Flow

Intake: The API validates the request, checks the idempotency key against Redis, and writes to Kafka. Topic partitioning is by recipient_id to ensure per-user ordering.
Orchestration: The orchestrator consumes from Kafka, resolves user preferences, applies rate limits, selects the channel, and dispatches to channel-specific workers.
Channel workers: Each worker handles provider-specific logic (FCM for push, SES for email, Twilio for SMS). Workers are independently scalable.
Delivery tracking: Workers report delivery status (sent, delivered, failed, bounced) back to the delivery tracker, which updates the notification state machine.

User Preference Resolution

resolve_channel(request, preferences):
    candidates = request.channels AND preferences.channels_enabled
    if candidates is empty:
        return IN_APP  // fallback: always deliver in-app

    for channel in candidates (ordered by priority):
        if rate_limit_check(user_id, channel) == UNDER_LIMIT:
            if not in_quiet_hours(preferences):
                return channel

    return IN_APP  // fallback if all channels exhausted

Rate Limiting

Rate limits operate at two levels:

Level	Mechanism	Limit
Per-user per-channel	Sliding window counter in Redis	5 push/day, 3 SMS/day
Per-sender (team)	Token bucket per sender ID	1000 notifications/minute
Global	Circuit breaker on provider APIs	Based on provider SLA

rate_limit_check(user_id, channel):
    key = "ratelimit:{user_id}:{channel}:{date}"
    count = redis.incr(key)
    if count == 1:
        redis.expire(key, 86400)
    return count <= preferences.frequency_cap[channel]

Priority Queuing

Kafka topics are segmented by priority:

notifications.critical: Processed immediately (OTP, security alerts)
notifications.high: Processed within 1 minute
notifications.normal: Processed within 5 minutes
notifications.low: Processed during off-peak hours

Critical and high-priority topics have dedicated consumer groups with more partitions and workers.

Deduplication

Two layers:

Idempotency key: Checked at intake against a Redis set with 24-hour TTL. Rejects duplicate requests.
Content hash: Before sending, hash (recipient_id, template_id, payload_hash). If the same content was sent in the last 24 hours, skip.

Trade-offs

Decision	Upside	Downside
Kafka per-user partitioning	Per-user ordering guaranteed	Hot partitions for power users
In-app as fallback channel	Notifications never fully lost	In-app may not be noticed
Priority-based topics	Clear SLA per priority level	Topic proliferation, more operational overhead
Redis for rate limits	Fast, atomic operations	Data loss on Redis failure (temporary over-sending)
Template-based content	Consistent messaging, easier localization	Less flexibility for ad-hoc notifications

Failure Modes

Push provider outage (FCM/APNs): Circuit breaker trips after 5 consecutive failures. Notifications queued for retry. If outage exceeds 1 hour, fall back to email or in-app.
Redis failure: Rate limits fail open (allow sending) for critical priority, fail closed (block sending) for low priority. Prevents spam while preserving critical alerts.
Kafka consumer lag: Auto-scale consumer instances based on lag metrics. Alert when lag exceeds 5 minutes for critical topics.
Duplicate sends: Even with deduplication, provider-level retries can cause duplicates. Accept this as an at-least-once guarantee and document it. Idempotent rendering on the client mitigates UX impact.
Template rendering failure: Send a generic fallback message rather than failing silently. Log the failure for template owners.

Scaling Considerations

Campaign sends: Batch recipients into chunks of 10,000. Process chunks in parallel across workers. Stagger sending over a time window to avoid provider rate limits.
Multi-region: Deploy notification workers in each region. Route notifications based on user's region to reduce latency and comply with data residency requirements.
Channel worker scaling: Scale each channel independently. Push workers may need 10x the capacity of SMS workers based on volume distribution.
Database: Use a time-series database (TimescaleDB or ClickHouse) for delivery tracking. Partition by date, retain detailed records for 30 days, aggregates for 1 year.

Observability

Delivery funnel: Track each notification through states: received, processed, sent, delivered, opened, failed. Visualize as a funnel per channel.
Latency: Measure time from request to send per priority level. Alert when p95 exceeds SLA.
Rate limit hits: Track how often users hit rate limits per channel. High rates indicate over-notification.
Provider health: Monitor error rates, latency, and throughput per provider. Dashboard showing FCM, SES, Twilio health side by side.

Key Takeaways

Separate intake, orchestration, and delivery into independent services. They scale differently.
Rate limit at the user level, not just the system level. Users care about their own notification volume.
Always have a fallback channel. In-app is the safest default.
Priority queuing prevents low-priority campaigns from starving time-sensitive transactional notifications.
Deduplication must happen at multiple layers: request intake and pre-send content check.

Final Thoughts

The best notification system is one users do not notice, in the sense that every notification they receive is relevant, timely, and delivered through the right channel. Over-notification is a product problem, but the system must enforce guardrails that product teams cannot bypass.