Problem Statement
As a system grows from a monolith into distributed services, cross-cutting concerns — authentication, authorisation, rate limiting, logging, routing — get duplicated across every service. Each team reimplements them slightly differently, creating inconsistencies and security gaps. A centralised API gateway resolves this by handling all these concerns in one place, presenting a unified surface to external clients while hiding internal service topology.
Key Challenges:
- Transparent proxying with minimal added latency
- Rate limiting that is accurate across multiple gateway instances
- Token validation without adding a synchronous call to the auth service on every request
- Granular routing rules supporting path, header, and tenant-based dispatch
- Structured logging capturing enough context for debugging without storing sensitive data
System Architecture
The gateway is a FastAPI application sitting in front of all internal services. Incoming requests pass through a middleware stack: JWT validation → rate limit check → routing → proxying → response logging. Redis holds rate limit counters and a token revocation list. All request/response metadata is logged to a structured store for monitoring.
JWT Validation
Access tokens are validated locally using the public key — no round-trip to the auth service. A Redis-backed revocation list is checked for explicitly invalidated tokens, providing security without per-request auth service calls.
Rate Limiting
Sliding window counters in Redis track request counts per client (by IP, API key, or user ID) per time window. Limits are configurable per route and per client tier, with burst allowances for legitimate traffic spikes.
Request Routing
Rule-based router matches incoming requests to upstream services by path prefix, HTTP method, header values, and tenant context. Supports load balancing across multiple upstream instances and circuit breaking on repeated failures.
Observability
Structured request logs capture method, path, client identity, upstream service, latency, status code, and rate limit state. Aggregated metrics expose request rates, error rates, and latency percentiles for dashboards and alerting.
Key Engineering Challenges
Distributed Rate Limit Accuracy
Challenge: Rate limits enforced locally on each gateway instance are inaccurate when multiple instances share the same client traffic.
Solution: Redis atomic increment with expiry values as the shared counter backend — all gateway instances read and write the same counter, ensuring accurate global rate enforcement regardless of instance count.
Token Revocation Without Auth Service Calls
Challenge: JWT tokens are validated locally for performance, but a compromised token needs to be revocable before its expiry.
Solution: Short token lifetimes (15 minutes) combined with a Redis revocation list for explicitly invalidated tokens, balancing performance with security without per-request auth service calls.
Circuit Breaking Upstream Failures
Challenge: A failing upstream service causes the gateway to queue requests, exhausting connections and degrading performance for other routes.
Solution: Per-upstream circuit breaker tracking error rates over a sliding window. Open circuits return 503 immediately with a Retry-After header, protecting the gateway from connection exhaustion.
Latency Overhead
Challenge: Every request passes through the gateway, making added latency directly visible to end users.
Solution: Async FastAPI with connection pooling to upstream services, Redis pipelining for rate limit operations, and response streaming to avoid buffering — keeping gateway overhead under 5ms p99.
Solutions Implemented
- Local JWT Validation: Public-key signature verification with Redis revocation list check — no auth service dependency on the hot path.
- Redis Sliding Window Rate Limiting: Atomic counter operations ensuring accurate global rate enforcement across gateway instances.
- Configurable Routing Rules: Path, method, header, and tenant-based request dispatch with per-upstream load balancing and health checking.
- Circuit Breaker: Per-upstream failure tracking with automatic open/half-open/closed state transitions protecting system stability.
- Structured Request Logging: Complete request context captured per call with sensitive field redaction and async write to prevent logging from adding latency.
Outcome & Impact
p99 added latency
One enforcement point
Full structured logs
Accurate across N instances