API Gateway & Rate-Limiting Service

Problem Statement

As a system grows from a monolith into distributed services, cross-cutting concerns — authentication, authorisation, rate limiting, logging, routing — get duplicated across every service. Each team reimplements them slightly differently, creating inconsistencies and security gaps. A centralised API gateway resolves this by handling all these concerns in one place, presenting a unified surface to external clients while hiding internal service topology.

Key Challenges:

Transparent proxying with minimal added latency
Rate limiting that is accurate across multiple gateway instances
Token validation without adding a synchronous call to the auth service on every request
Granular routing rules supporting path, header, and tenant-based dispatch
Structured logging capturing enough context for debugging without storing sensitive data

System Architecture

The gateway is a FastAPI application sitting in front of all internal services. Incoming requests pass through a middleware stack: JWT validation → rate limit check → routing → proxying → response logging. Redis holds rate limit counters and a token revocation list. All request/response metadata is logged to a structured store for monitoring.

JWT Validation

Access tokens are validated locally using the public key — no round-trip to the auth service. A Redis-backed revocation list is checked for explicitly invalidated tokens, providing security without per-request auth service calls.

Rate Limiting

Sliding window counters in Redis track request counts per client (by IP, API key, or user ID) per time window. Limits are configurable per route and per client tier, with burst allowances for legitimate traffic spikes.

Request Routing

Rule-based router matches incoming requests to upstream services by path prefix, HTTP method, header values, and tenant context. Supports load balancing across multiple upstream instances and circuit breaking on repeated failures.

Observability

Structured request logs capture method, path, client identity, upstream service, latency, status code, and rate limit state. Aggregated metrics expose request rates, error rates, and latency percentiles for dashboards and alerting.

Key Engineering Challenges

Distributed Rate Limit Accuracy

Challenge: Rate limits enforced locally on each gateway instance are inaccurate when multiple instances share the same client traffic.

Solution: Redis atomic increment with expiry values as the shared counter backend — all gateway instances read and write the same counter, ensuring accurate global rate enforcement regardless of instance count.

Token Revocation Without Auth Service Calls

Challenge: JWT tokens are validated locally for performance, but a compromised token needs to be revocable before its expiry.

Solution: Short token lifetimes (15 minutes) combined with a Redis revocation list for explicitly invalidated tokens, balancing performance with security without per-request auth service calls.

Circuit Breaking Upstream Failures

Challenge: A failing upstream service causes the gateway to queue requests, exhausting connections and degrading performance for other routes.

Solution: Per-upstream circuit breaker tracking error rates over a sliding window. Open circuits return 503 immediately with a Retry-After header, protecting the gateway from connection exhaustion.

Latency Overhead

Challenge: Every request passes through the gateway, making added latency directly visible to end users.

Solution: Async FastAPI with connection pooling to upstream services, Redis pipelining for rate limit operations, and response streaming to avoid buffering — keeping gateway overhead under 5ms p99.

Solutions Implemented

Local JWT Validation: Public-key signature verification with Redis revocation list check — no auth service dependency on the hot path.
Redis Sliding Window Rate Limiting: Atomic counter operations ensuring accurate global rate enforcement across gateway instances.
Configurable Routing Rules: Path, method, header, and tenant-based request dispatch with per-upstream load balancing and health checking.
Circuit Breaker: Per-upstream failure tracking with automatic open/half-open/closed state transitions protecting system stability.
Structured Request Logging: Complete request context captured per call with sensitive field redaction and async write to prevent logging from adding latency.

Outcome & Impact

<5ms Gateway Overhead

p99 added latency

Unified Security Policy

One enforcement point

100% Request Observability

Full structured logs

Scalable Rate Limiting

Accurate across N instances