How I'd Design a Scalable Notification System

Dhruval Dhameliya·January 16, 2026·6 min read

System design for a multi-channel notification system covering delivery guarantees, rate limiting, user preferences, and failure handling at scale.

Notification systems appear simple until you face duplicate sends, channel coordination, user preference management, and delivery tracking across millions of users. This post covers a design that handles these concerns without collapsing under scale.

Related: Designing a Simple Metrics Collection Service.

See also: Design Trade-offs I'd Make Differently Today.

Context

A notification system serves multiple internal teams (marketing, product, transactional alerts) and delivers across multiple channels (push, email, SMS, in-app). Each channel has different latency expectations, cost profiles, and delivery guarantees. The system must prioritize user experience while giving internal teams self-service capabilities.

Problem

Design a notification system that:

  • Supports push, email, SMS, and in-app channels
  • Respects user preferences and quiet hours
  • Prevents duplicate and excessive notifications
  • Scales to millions of recipients per campaign
  • Provides delivery tracking and analytics

Constraints

ConstraintDetail
Throughput10M notifications/hour for campaign sends
LatencyTransactional notifications delivered within 30 seconds
DeduplicationNo user receives the same notification twice within 24 hours
Rate limitingMax 5 push notifications per user per day
CostSMS costs $0.01-0.05 per message; must optimize channel selection

Design

High-Level Architecture

Notification Request -> API Gateway -> Notification Service
                                            |
                                    Priority Queue (Kafka)
                                            |
                                    Orchestrator Service
                                     /      |       \
                              Push Worker  Email Worker  SMS Worker
                                     \      |       /
                                    Delivery Tracker

Data Model

NotificationRequest {
    id: UUID
    recipient_id: String
    template_id: String
    channel: List<Channel>  // PUSH, EMAIL, SMS, IN_APP
    priority: Priority      // CRITICAL, HIGH, NORMAL, LOW
    payload: Map<String, Any>
    scheduled_at: Timestamp?
    idempotency_key: String
}

UserPreferences {
    user_id: String
    channels_enabled: Set<Channel>
    quiet_hours: TimeRange?
    frequency_cap: Map<Channel, Int>  // max per day
    timezone: String
}

Request Flow

  1. Intake: The API validates the request, checks the idempotency key against Redis, and writes to Kafka. Topic partitioning is by recipient_id to ensure per-user ordering.
  2. Orchestration: The orchestrator consumes from Kafka, resolves user preferences, applies rate limits, selects the channel, and dispatches to channel-specific workers.
  3. Channel workers: Each worker handles provider-specific logic (FCM for push, SES for email, Twilio for SMS). Workers are independently scalable.
  4. Delivery tracking: Workers report delivery status (sent, delivered, failed, bounced) back to the delivery tracker, which updates the notification state machine.

User Preference Resolution

resolve_channel(request, preferences):
    candidates = request.channels AND preferences.channels_enabled
    if candidates is empty:
        return IN_APP  // fallback: always deliver in-app

    for channel in candidates (ordered by priority):
        if rate_limit_check(user_id, channel) == UNDER_LIMIT:
            if not in_quiet_hours(preferences):
                return channel

    return IN_APP  // fallback if all channels exhausted

Rate Limiting

Rate limits operate at two levels:

LevelMechanismLimit
Per-user per-channelSliding window counter in Redis5 push/day, 3 SMS/day
Per-sender (team)Token bucket per sender ID1000 notifications/minute
GlobalCircuit breaker on provider APIsBased on provider SLA
rate_limit_check(user_id, channel):
    key = "ratelimit:{user_id}:{channel}:{date}"
    count = redis.incr(key)
    if count == 1:
        redis.expire(key, 86400)
    return count <= preferences.frequency_cap[channel]

Priority Queuing

Kafka topics are segmented by priority:

  • notifications.critical: Processed immediately (OTP, security alerts)
  • notifications.high: Processed within 1 minute
  • notifications.normal: Processed within 5 minutes
  • notifications.low: Processed during off-peak hours

Critical and high-priority topics have dedicated consumer groups with more partitions and workers.

Deduplication

Two layers:

  1. Idempotency key: Checked at intake against a Redis set with 24-hour TTL. Rejects duplicate requests.
  2. Content hash: Before sending, hash (recipient_id, template_id, payload_hash). If the same content was sent in the last 24 hours, skip.

Trade-offs

DecisionUpsideDownside
Kafka per-user partitioningPer-user ordering guaranteedHot partitions for power users
In-app as fallback channelNotifications never fully lostIn-app may not be noticed
Priority-based topicsClear SLA per priority levelTopic proliferation, more operational overhead
Redis for rate limitsFast, atomic operationsData loss on Redis failure (temporary over-sending)
Template-based contentConsistent messaging, easier localizationLess flexibility for ad-hoc notifications

Failure Modes

  • Push provider outage (FCM/APNs): Circuit breaker trips after 5 consecutive failures. Notifications queued for retry. If outage exceeds 1 hour, fall back to email or in-app.
  • Redis failure: Rate limits fail open (allow sending) for critical priority, fail closed (block sending) for low priority. Prevents spam while preserving critical alerts.
  • Kafka consumer lag: Auto-scale consumer instances based on lag metrics. Alert when lag exceeds 5 minutes for critical topics.
  • Duplicate sends: Even with deduplication, provider-level retries can cause duplicates. Accept this as an at-least-once guarantee and document it. Idempotent rendering on the client mitigates UX impact.
  • Template rendering failure: Send a generic fallback message rather than failing silently. Log the failure for template owners.

Scaling Considerations

  • Campaign sends: Batch recipients into chunks of 10,000. Process chunks in parallel across workers. Stagger sending over a time window to avoid provider rate limits.
  • Multi-region: Deploy notification workers in each region. Route notifications based on user's region to reduce latency and comply with data residency requirements.
  • Channel worker scaling: Scale each channel independently. Push workers may need 10x the capacity of SMS workers based on volume distribution.
  • Database: Use a time-series database (TimescaleDB or ClickHouse) for delivery tracking. Partition by date, retain detailed records for 30 days, aggregates for 1 year.

Observability

  • Delivery funnel: Track each notification through states: received, processed, sent, delivered, opened, failed. Visualize as a funnel per channel.
  • Latency: Measure time from request to send per priority level. Alert when p95 exceeds SLA.
  • Rate limit hits: Track how often users hit rate limits per channel. High rates indicate over-notification.
  • Provider health: Monitor error rates, latency, and throughput per provider. Dashboard showing FCM, SES, Twilio health side by side.

Key Takeaways

  • Separate intake, orchestration, and delivery into independent services. They scale differently.
  • Rate limit at the user level, not just the system level. Users care about their own notification volume.
  • Always have a fallback channel. In-app is the safest default.
  • Priority queuing prevents low-priority campaigns from starving time-sensitive transactional notifications.
  • Deduplication must happen at multiple layers: request intake and pre-send content check.

Further Reading

Final Thoughts

The best notification system is one users do not notice, in the sense that every notification they receive is relevant, timely, and delivered through the right channel. Over-notification is a product problem, but the system must enforce guardrails that product teams cannot bypass.

Recommended