Data Engineering 2024

Real-Time Analytics & Monitoring Dashboard

Built a streaming analytics platform that collects application and user activity events, processes them in real time, and visualises operational metrics through interactive dashboards. The system aggregates logs, calculates performance indicators, and triggers alerts when anomalies occur — enabling administrators to understand system health and usage behaviour without waiting on delayed batch reports.

Technology Stack:
PythonWebSocketsRedis StreamsTimescaleDBDockerCharting

Problem Statement

Operations teams managing distributed applications often rely on delayed batch reports that are hours behind reality. By the time an issue is spotted in a report, users have already experienced degraded performance or data loss. The need was a live monitoring layer that surfaces system health, user behaviour, and anomalies in seconds rather than hours — without requiring a complex data warehouse setup.

Key Challenges:

  • Ingesting high-frequency event streams without dropping data under load
  • Low-latency aggregation suitable for real-time display
  • Efficient time-series storage with long retention at reasonable cost
  • Anomaly detection without excessive false positives
  • Live dashboard delivery to multiple browser clients simultaneously

System Architecture

Events are emitted by applications and pushed into Redis Streams as the ingestion buffer. Stream consumers aggregate and compute metrics, persisting time-series data to TimescaleDB. A WebSocket server pushes metric updates to connected dashboards in real time. Alerting rules evaluate incoming metrics and dispatch notifications when thresholds are breached.

Event Ingestion

Applications publish structured events to Redis Streams via a lightweight SDK. The buffer absorbs traffic spikes and decouples producers from consumers, preventing backpressure from reaching application services.

Stream Processing

Consumer groups read events, compute windowed aggregates (per-minute, per-hour counts, averages, percentiles), and write time-series records to TimescaleDB using continuous aggregates for efficient historicals.

WebSocket Delivery

A Python WebSocket server maintains connections with dashboard clients and broadcasts metric updates as they are computed, delivering sub-second latency between event occurrence and dashboard display.

Alerting Engine

Rule-based evaluator checks incoming metrics against configurable thresholds. Anomaly detection uses rolling z-score analysis to catch unexpected spikes. Alerts are dispatched via email, SMS, or webhook.

Key Engineering Challenges

High-Frequency Ingestion

Challenge: Applications generating thousands of events per second can overwhelm a synchronous storage layer.

Solution: Redis Streams as a durable, ordered buffer with consumer group semantics ensuring events are processed exactly once and no data is lost during consumer restarts.

Time-Series Storage at Scale

Challenge: Raw event storage grows unbounded; querying large tables for aggregates becomes slow.

Solution: TimescaleDB with automatic partitioning by time (hypertables) and continuous aggregate materialised views precomputing common metric rollups for fast dashboard queries.

Simultaneous Dashboard Clients

Challenge: Broadcasting live updates to many connected clients efficiently without duplicating computation.

Solution: Single metric computation path writes to a shared publish channel; the WebSocket server fans out to all subscribed clients, keeping server-side computation constant regardless of client count.

Alert Noise Reduction

Challenge: Threshold-based alerts trigger excessively on normal traffic variation, leading to alert fatigue.

Solution: Rolling z-score anomaly detection combined with sustained-breach requirements (alert only after N consecutive threshold violations) dramatically reducing false positive rate.

Solutions Implemented

  • Redis Streams Ingestion: Durable ordered event buffer with consumer group processing, replay capability, and automatic trimming to manage memory.
  • TimescaleDB Continuous Aggregates: Precomputed rollups for 1-minute, 1-hour, and 1-day windows enabling instant historical chart queries.
  • WebSocket Broadcast Server: Async Python server managing concurrent client connections with per-metric subscription filtering.
  • Anomaly Detection: Rolling statistical analysis flagging values deviating significantly from recent baselines, with hysteresis to prevent flapping.
  • Interactive Dashboards: Configurable metric panels with time range selectors, drill-down capability, and exportable reports for management review.

Outcome & Impact

<1s Event-to-Dashboard Latency

From emission to display

10K+ Events/Second

Sustained ingestion capacity

70% Faster Incident Detection

Vs. previous batch reporting

90% Alert Noise Reduction

After anomaly detection tuning