Problem Statement
Notification logic scattered across business services becomes a maintenance problem: duplicate delivery code, inconsistent retry handling, no central audit trail of what was sent to whom and when, and difficulty adding new channels or respecting user communication preferences. Centralising notifications into a dedicated engine removes this complexity from application code while providing reliability, personalisation, and observability.
Key Challenges:
- Reliable delivery across third-party channels (SMS, email) with variable uptime
- User preference management — channel, frequency, and opt-out handling
- Preventing duplicate delivery on retries
- Template rendering supporting personalisation tokens and conditional content
- Scheduling future notifications and cancelling them if the triggering condition resolves
System Architecture
Business services publish notification events to a message queue. The engine consumes events, resolves user preferences, renders templates, and dispatches to the appropriate channel via provider APIs. Background workers handle retries for failed deliveries. A delivery log records every dispatch attempt and outcome.
Event-Driven Intake
Services publish typed notification events (e.g., OrderConfirmed, PaymentFailed, AppointmentReminder) to a queue. The engine consumes events and resolves recipient identity, preferences, and the appropriate template without the publishing service knowing delivery details.
Preference Resolution
Per-user preference store defines allowed channels, quiet hours, frequency limits, and opt-out status per notification category. The engine filters delivery channels before dispatch, ensuring user preferences are respected without burdening the publishing service.
Template Engine
Notification templates support personalisation tokens, conditional blocks, and localisation. Templates are stored and versioned separately from code, allowing content updates without deployment. Previews are available for review before publishing.
Delivery & Retry
Each delivery attempt is logged with the provider response. Failed deliveries are retried with exponential backoff up to configurable limits. Idempotency keys prevent duplicate messages on retry. Persistent failure triggers an alert for investigation.
Key Engineering Challenges
Duplicate Delivery on Retry
Challenge: Retrying a failed delivery without idempotency guarantees sends the same message multiple times, frustrating recipients.
Solution: Idempotency key derived from event ID and recipient channel stored in the delivery log. Before dispatching, the engine checks whether this combination has already been successfully delivered.
Provider API Reliability
Challenge: SMS and email providers have variable availability; a provider outage should not permanently drop notifications.
Solution: Per-provider circuit breaker with fallback to alternate providers where configured. Failed events remain in the queue until the provider recovers or the max retry window expires.
Scheduled Notification Cancellation
Challenge: An appointment reminder scheduled 24 hours ahead should be cancelled if the appointment is cancelled before the reminder fires.
Solution: Scheduled notifications are stored as pending jobs with a cancellation key. Publishing a cancellation event with the same key removes the pending job before it executes.
Frequency Capping
Challenge: System events can generate bursts of notifications that overwhelm users with messages in a short period.
Solution: Per-user per-category frequency counters in Redis enforcing maximum message rates. Excess notifications within the capping window are dropped or deferred to the next allowed window.
Solutions Implemented
- Event-Driven Architecture: Decoupled intake via message queue, separating notification concerns entirely from business service code.
- Preference Engine: Per-user opt-out, channel selection, quiet hours, and frequency caps enforced before every delivery.
- Versioned Templates: Personalised, localised templates stored separately from code with preview capability and A/B variant support.
- Idempotent Delivery: Delivery log with idempotency keys preventing duplicate sends on retry or event replay.
- Delivery Audit Trail: Complete log of every dispatch attempt, provider response, and final outcome per notification event.
Outcome & Impact
Across all channels
With idempotency enforcement
SMS, Email, In-App
User opt-outs always respected