How I'd Design a Scalable Notification System
System design for a multi-channel notification system covering delivery guarantees, rate limiting, user preferences, and failure handling at scale.
Notification systems appear simple until you face duplicate sends, channel coordination, user preference management, and delivery tracking across millions of users. This post covers a design that handles these concerns without collapsing under scale.
Related: Designing a Simple Metrics Collection Service.
See also: Design Trade-offs I'd Make Differently Today.
Context
A notification system serves multiple internal teams (marketing, product, transactional alerts) and delivers across multiple channels (push, email, SMS, in-app). Each channel has different latency expectations, cost profiles, and delivery guarantees. The system must prioritize user experience while giving internal teams self-service capabilities.
Problem
Design a notification system that:
- Supports push, email, SMS, and in-app channels
- Respects user preferences and quiet hours
- Prevents duplicate and excessive notifications
- Scales to millions of recipients per campaign
- Provides delivery tracking and analytics
Constraints
| Constraint | Detail |
|---|---|
| Throughput | 10M notifications/hour for campaign sends |
| Latency | Transactional notifications delivered within 30 seconds |
| Deduplication | No user receives the same notification twice within 24 hours |
| Rate limiting | Max 5 push notifications per user per day |
| Cost | SMS costs $0.01-0.05 per message; must optimize channel selection |
Design
High-Level Architecture
Notification Request -> API Gateway -> Notification Service
|
Priority Queue (Kafka)
|
Orchestrator Service
/ | \
Push Worker Email Worker SMS Worker
\ | /
Delivery Tracker
Data Model
NotificationRequest {
id: UUID
recipient_id: String
template_id: String
channel: List<Channel> // PUSH, EMAIL, SMS, IN_APP
priority: Priority // CRITICAL, HIGH, NORMAL, LOW
payload: Map<String, Any>
scheduled_at: Timestamp?
idempotency_key: String
}
UserPreferences {
user_id: String
channels_enabled: Set<Channel>
quiet_hours: TimeRange?
frequency_cap: Map<Channel, Int> // max per day
timezone: String
}
Request Flow
- Intake: The API validates the request, checks the idempotency key against Redis, and writes to Kafka. Topic partitioning is by
recipient_idto ensure per-user ordering. - Orchestration: The orchestrator consumes from Kafka, resolves user preferences, applies rate limits, selects the channel, and dispatches to channel-specific workers.
- Channel workers: Each worker handles provider-specific logic (FCM for push, SES for email, Twilio for SMS). Workers are independently scalable.
- Delivery tracking: Workers report delivery status (sent, delivered, failed, bounced) back to the delivery tracker, which updates the notification state machine.
User Preference Resolution
resolve_channel(request, preferences):
candidates = request.channels AND preferences.channels_enabled
if candidates is empty:
return IN_APP // fallback: always deliver in-app
for channel in candidates (ordered by priority):
if rate_limit_check(user_id, channel) == UNDER_LIMIT:
if not in_quiet_hours(preferences):
return channel
return IN_APP // fallback if all channels exhausted
Rate Limiting
Rate limits operate at two levels:
| Level | Mechanism | Limit |
|---|---|---|
| Per-user per-channel | Sliding window counter in Redis | 5 push/day, 3 SMS/day |
| Per-sender (team) | Token bucket per sender ID | 1000 notifications/minute |
| Global | Circuit breaker on provider APIs | Based on provider SLA |
rate_limit_check(user_id, channel):
key = "ratelimit:{user_id}:{channel}:{date}"
count = redis.incr(key)
if count == 1:
redis.expire(key, 86400)
return count <= preferences.frequency_cap[channel]
Priority Queuing
Kafka topics are segmented by priority:
notifications.critical: Processed immediately (OTP, security alerts)notifications.high: Processed within 1 minutenotifications.normal: Processed within 5 minutesnotifications.low: Processed during off-peak hours
Critical and high-priority topics have dedicated consumer groups with more partitions and workers.
Deduplication
Two layers:
- Idempotency key: Checked at intake against a Redis set with 24-hour TTL. Rejects duplicate requests.
- Content hash: Before sending, hash
(recipient_id, template_id, payload_hash). If the same content was sent in the last 24 hours, skip.
Trade-offs
| Decision | Upside | Downside |
|---|---|---|
| Kafka per-user partitioning | Per-user ordering guaranteed | Hot partitions for power users |
| In-app as fallback channel | Notifications never fully lost | In-app may not be noticed |
| Priority-based topics | Clear SLA per priority level | Topic proliferation, more operational overhead |
| Redis for rate limits | Fast, atomic operations | Data loss on Redis failure (temporary over-sending) |
| Template-based content | Consistent messaging, easier localization | Less flexibility for ad-hoc notifications |
Failure Modes
- Push provider outage (FCM/APNs): Circuit breaker trips after 5 consecutive failures. Notifications queued for retry. If outage exceeds 1 hour, fall back to email or in-app.
- Redis failure: Rate limits fail open (allow sending) for critical priority, fail closed (block sending) for low priority. Prevents spam while preserving critical alerts.
- Kafka consumer lag: Auto-scale consumer instances based on lag metrics. Alert when lag exceeds 5 minutes for critical topics.
- Duplicate sends: Even with deduplication, provider-level retries can cause duplicates. Accept this as an at-least-once guarantee and document it. Idempotent rendering on the client mitigates UX impact.
- Template rendering failure: Send a generic fallback message rather than failing silently. Log the failure for template owners.
Scaling Considerations
- Campaign sends: Batch recipients into chunks of 10,000. Process chunks in parallel across workers. Stagger sending over a time window to avoid provider rate limits.
- Multi-region: Deploy notification workers in each region. Route notifications based on user's region to reduce latency and comply with data residency requirements.
- Channel worker scaling: Scale each channel independently. Push workers may need 10x the capacity of SMS workers based on volume distribution.
- Database: Use a time-series database (TimescaleDB or ClickHouse) for delivery tracking. Partition by date, retain detailed records for 30 days, aggregates for 1 year.
Observability
- Delivery funnel: Track each notification through states: received, processed, sent, delivered, opened, failed. Visualize as a funnel per channel.
- Latency: Measure time from request to send per priority level. Alert when p95 exceeds SLA.
- Rate limit hits: Track how often users hit rate limits per channel. High rates indicate over-notification.
- Provider health: Monitor error rates, latency, and throughput per provider. Dashboard showing FCM, SES, Twilio health side by side.
Key Takeaways
- Separate intake, orchestration, and delivery into independent services. They scale differently.
- Rate limit at the user level, not just the system level. Users care about their own notification volume.
- Always have a fallback channel. In-app is the safest default.
- Priority queuing prevents low-priority campaigns from starving time-sensitive transactional notifications.
- Deduplication must happen at multiple layers: request intake and pre-send content check.
Further Reading
- How I'd Design a Mobile Configuration System at Scale: Designing a configuration system for mobile apps at scale, covering config delivery, caching layers, override hierarchies, and safe rollo...
- Event Tracking System Design for Android Applications: A systems-level breakdown of designing an event tracking system for Android, covering batching, schema enforcement, local persistence, an...
- What I Look for in System Designs: The specific qualities and patterns I look for when reviewing system designs, from data flow clarity to failure mode analysis, and the co...
Final Thoughts
The best notification system is one users do not notice, in the sense that every notification they receive is relevant, timely, and delivered through the right channel. Over-notification is a product problem, but the system must enforce guardrails that product teams cannot bypass.
Recommended
Event Tracking System Design for Android Applications
A systems-level breakdown of designing an event tracking system for Android, covering batching, schema enforcement, local persistence, and delivery guarantees.
Designing a Feature Flag and Remote Config System
Architecture and trade-offs for building a feature flag and remote configuration system that handles targeting, rollout, and consistency across mobile clients.
Mobile Analytics Pipeline: From App Event to Dashboard
End-to-end design of a mobile analytics pipeline covering ingestion, processing, storage, and querying, with emphasis on reliability and latency trade-offs.