When we set out to build CDNZero, we had one non-negotiable constraint: P95 latency below 50ms for any user, anywhere in the world. Here's how we made that happen.
Our architecture uses three caching tiers. The first is the edge layer — 200+ points of presence using anycast routing so users always hit the closest node. Each edge node runs an in-memory LRU cache for hot objects and an SSD-backed cache for the long tail.
The second tier is our regional shield layer. When an edge node has a cache miss, it doesn't go directly to origin. Instead, it checks a regional shield (one per continent) that aggregates cache from nearby edges. This collapses duplicate origin requests during traffic spikes.
The third tier is the origin shield — a single logical layer that sits in front of your origin server. It ensures that even during a global cache purge, your origin only receives one request per object rather than 200+ simultaneous requests from every edge.
Routing is handled via anycast BGP with latency-based DNS fallback. When a user makes a request, they're routed to the closest healthy edge node automatically. If a node goes down, traffic shifts in under 5 seconds without any DNS propagation delay.
We also invest heavily in connection reuse. Each edge maintains persistent HTTP/2 connections to the shield layer, and the shield maintains connections to your origin. This eliminates TCP and TLS handshake overhead for cache misses.
The result: our global P50 is 12ms, P95 is 38ms, and P99 is 47ms. Even cache misses (origin fetches) typically resolve in under 100ms because of connection reuse and regional proximity.
These numbers are measured from real user traffic, not synthetic tests. We publish live latency data on our status page broken down by region.