October 11, 2025
TL;DR: The worst UX moment is the blank page before the first byte. In Next.js, cold starts and database latency are primary causes of slow TTFB - especially after idle time, deploys, or on rarely hit SSR routes. Aim for <100ms TTFB (truly instant) via static/cached content, tolerate 100-300ms for dynamic SSR, and treat 300ms+ as needing fixes. Solutions span the deployment spectrum: bare metal servers (Ryzen 9950X3D) for consistent sub-100ms performance, serverless platforms (Vercel/AWS/Cloudflare) with warming strategies, and architectural patterns (SSG/ISR, edge caching, co-located databases).
Time to First Byte (TTFB) measures how long the browser waits for the first byte of data from your server after a request. If it's slow, users stare at a blank page and may assume the site is hung or something is wrong with their connection. Because nothing can render or stream until the first byte lands, long TTFB is uniquely harmful-often worse than a slow LCP.
Let's be honest about what "fast" actually means to users:
The often-cited target of "200-800ms TTFB" is misleading. At 800ms, you're approaching a full second before anything renders-users will absolutely notice this as slow. While Google's Web Vitals threshold is 800ms, this is a minimum to avoid search ranking penalties-not a UX target. Google's own products deliver sub-100ms TTFB, and that's the real standard for perceived instant performance. Aim for <100ms for truly instant experiences (achievable with static/cached content), tolerate 100-300ms for dynamic SSR, and treat anything over 300ms as needing investigation and improvement.
Source: https://web.dev/articles/ttfb
Cold-start time lives inside TTFB between request queueing and the first byte (instance boot => app init => SSR render). Network distance contributes earlier (DNS/TCP/TLS).
A cold start is the extra boot time that occurs when your platform spins up a new instance after idle time or a fresh deploy. The platform allocates resources, loads your app, initializes the runtime, and establishes connections - and all of that accrues entirely inside TTFB.
You'll notice cold-start TTFB spikes most on SSR pages and API routes that execute on demand. The risk is higher for infrequently accessed routes or apps with sparse traffic (first request after idle), and right after deployments (warm instances are replaced). Sudden traffic spikes can also outpace warm capacity, making many users hit cold boots at once.
Cause (symptom) | Primary fixes |
---|---|
Idle time / scale-to-zero leading to cold boots on first hit | Bare metal/VPS (no cold starts), keep functions warm (scale-to-one), predictive warming, serve cached/SSG responses first |
Deploys / new code paths invalidating warm instances | Rolling deployments, pre-warm critical routes, bytecode caching, or use persistent servers (bare metal/VPS) |
Sparse or rarely accessed SSR routes | Convert to SSG/ISR, add caching, consider Edge Runtime for lightweight logic |
Heavy server work on the request path (slow DB/API/compute) | Cache & batch, move work off the critical path, tighten queries, precompute where feasible |
Database latency (slow queries, distant DB) | Co-locate database with app server (1-5ms within same region), use read replicas in each region, connection pooling, query optimization |
Network distance (users far from origin) | CDN + multi-region routing, cacheable responses, edge-served content |
Sudden traffic spikes with many first-time requests | Instance concurrency (reuse), predictive scaling, warm-up bursts, or use bare metal with PM2 clustering |
Large/slow JS compile on cold boot | Bytecode caching, smaller bundles, avoid unnecessary startup imports |
Before you optimize cold starts, check your database latency. Database round-trips are often the primary cause of slow TTFB, not cold starts.
Download our service guide to understand how we can help you optimise your site speed
If your database is 50ms away from your app server and your page makes 3 queries: - 50ms x 3 queries = 150ms just for data fetching - Add 50-100ms for SSR rendering - Result: 200-250ms baseline TTFB, before any cold start overhead
The requirement: Your database should be 1-5ms ping from your application server within the same region, ideally in the same datacenter or VPC.
The fix is straightforward but requires architectural discipline. Deploy your app and database in the same cloud region or datacenter so they can communicate over the local network. For multi-region applications, deploy read replicas in each region where you run app servers-this ensures every request hits a local database regardless of where the user connects. Use connection poolers like PgBouncer or RDS Proxy to reduce connection establishment overhead, which can add 10-50ms per request without pooling. When data freshness requirements allow, cache database-heavy queries at the edge using a CDN or edge compute platform. Finally, use database query analysis tools to eliminate N+1 queries and unnecessary data fetching-often you're pulling entire objects when you only need a few fields.
Example: An app in us-east-1 querying a database in eu-west-1 will add ~75ms to every request. Move the database (or add a read replica) to us-east-1 to reduce this to 1-5ms.
The architecture decision fundamentally shapes your TTFB profile. Next.js is heavily single-threaded, which changes the cost-benefit analysis of different hosting approaches.
When to choose: You want predictable, consistently fast TTFB with zero cold starts and full control.
Why it works for Next.js: Next.js is heavily single-threaded, which means it benefits massively from high single-core CPU performance rather than core count. Modern CPUs like the AMD Ryzen 9950X3D deliver 2-5x faster server-side rendering than the older EPYC and Xeon processors typically used in serverless platforms-not just during cold starts, but for every single request. The raw compute power of a high-frequency CPU with large L3 cache dramatically outperforms the shared, virtualized infrastructure of serverless platforms. Your process stays running continuously, so you never pay any cold start penalty, and performance is predictable since you're not competing with noisy neighbors.
Hosting options: You have several paths depending on your control and budget requirements. Hetzner offers dedicated servers with AMD Ryzen and EPYC processors at excellent price points with strong single-core performance. OVH provides bare metal servers with high-frequency CPUs. For maximum control over hardware selection, self-hosted colocation lets you spec your exact CPU and memory configuration. If you prefer cloud infrastructure with less hands-on management, AWS EC2 c7i instances or GCP C3 instances offer high-frequency Intel and AMD chips optimized for compute workloads.
Setup: The typical architecture uses PM2 to run multiple Next.js instances in cluster mode, utilizing all CPU cores effectively. Alternatively, Docker makes deployment simpler while maintaining bare-metal performance-containerization overhead is negligible for CPU-bound workloads like Next.js SSR. Place a reverse proxy like Nginx or Caddy in front to handle static asset caching, SSL termination, and connection management. Configure systemd (or Docker restart policies) to ensure your application automatically restarts on failure. For redundancy, add a load balancer to distribute traffic across multiple servers, though for many applications a single high-performance server with proper monitoring is sufficient.
Cost advantage: Bare metal is dramatically cheaper than serverless for high-traffic applications. Around $100/month for dedicated servers (more for additional RAM) can handle what would cost significantly more on serverless platforms at high traffic volumes. A $100/month Hetzner dedicated server includes unlimited traffic at 1Gbps (or many terabytes with fair use), whereas Vercel charges $0.15/GB ($150/TB) beyond the 1TB included in Pro plans. At 10TB/month traffic, Vercel costs $1,370/month versus Hetzner's flat $100-nearly 14x more expensive for bandwidth-heavy applications. For deployment simplicity, platforms like uncloud.run provide managed bare metal with git-push deployment, giving you serverless-like developer experience with bare metal economics and performance.
Trade-offs: The responsibility shifts to you. You manage infrastructure, handle OS updates and security patches, and scale by provisioning additional servers rather than letting a platform handle it automatically. You won't get automatic multi-region routing out of the box-use Cloudflare or a CDN for global distribution. The exchange is control, performance, and dramatically lower costs for operational responsibility.
When to choose: You want automatic scaling, zero infrastructure management, and can tolerate occasional cold starts.
Platform approaches: Modern serverless platforms have developed sophisticated strategies to minimize cold starts. Vercel's Fluid Compute keeps at least one instance warm per function on Pro and Enterprise plans, handles multiple requests per instance concurrently, and predictively warms functions based on traffic patterns. Their bytecode caching delivers up to 27% faster cold starts, and rolling deployments avoid cold start storms during updates.
AWS Lambda offers SnapStart for Java and Node, which snapshots initialized functions for near-instant cold starts. Provisioned concurrency guarantees warm instances for critical functions, while Lambda@Edge runs at CloudFront's edge locations. CloudFront's global CDN caches responses close to users.
Cloudflare Workers uses V8 isolates with theoretical <5ms startup time, though real-world cold start latency is typically 200-500ms due to network overhead, script loading, and initialization. While better than traditional Lambda cold starts (500-3000ms), this still means many requests experience noticeable delays-highlighting the advantage of bare metal servers that eliminate cold starts entirely. Their global edge network spans 330+ cities. Pages Functions provide a hybrid approach for Next.js apps, with KV and Durable Objects for stateful edge storage.
Netlify offers Edge Functions using the Deno runtime, keeps function instances warm between requests, and supports Next.js ISR for background revalidation.
Trade-offs: Cold starts still occur, even if modern platforms have reduced them to less than 1% of requests. You're running on shared infrastructure, which means occasional noisy neighbor problems where another tenant's workload affects your performance. Advanced features like predictive warming and bytecode caching create platform lock-in-switching providers means re-architecting. Cost can scale unpredictably with traffic since you're billed per request or compute time rather than fixed server costs.
Regardless of hosting approach, these architectural patterns improve TTFB:
Use Next.js Static Site Generation (SSG) or Incremental Static Regeneration (ISR) whenever possible. Static pages can be served from CDN edge locations, delivering TTFB of 20-50ms globally.
Next.js offers a rendering spectrum. SSG pre-renders pages at build time using generateStaticParams
in the App Router or getStaticProps
in the Pages Router - this is the fastest option since pages are just static HTML served from CDN. ISR extends this with background revalidation via the revalidate
option, serving stale content immediately while regenerating in the background.
For dynamic server-rendered pages, aggressive CDN caching with Cache-Control
headers is critical. Pages that don't change often-especially for non-logged-in users-can be cached for hours or even days. Use stale-while-revalidate
to serve cached content instantly while fetching fresh content in the background: Cache-Control: s-maxage=3600, stale-while-revalidate=86400
caches for 1 hour but serves stale content for up to 24 hours while revalidating. This turns every SSR request after the first into a <50ms CDN hit. Note: stale-while-revalidate requires CDNs like Cloudflare, CloudFront or Fastly; Vercel's CDN doesn't currently support this directive.
Example: A blog post that updates hourly can use ISR with 3600s revalidation, delivering <50ms TTFB from CDN.
A note on HTML size: While not strictly TTFB, avoid massive DOM downloads on SSG/ISR pages. A 2MB HTML file takes 16 seconds to download on a slow 3G connection (1Mbps)-or longer in real-world conditions due to network overhead, packet loss, and connection establishment time. Users on slow connections will perceive this as slow TTFB even though the first byte arrived quickly-the page stays blank until enough HTML downloads to render. Keep server-rendered HTML under 100KB when possible by lazy-loading content, deferring non-critical elements, and avoiding inline data dumps.
If a page must be server-rendered on each request, ruthlessly minimize work on the critical path. Use Promise.all()
to fetch from multiple data sources concurrently rather than sequentially-three 50ms queries take 50ms in parallel instead of 150ms serial. Eliminate N+1 queries by using proper joins or batching, add database indexes for common lookups, and use select
to fetch only the fields you actually render rather than pulling entire objects. Move analytics, logging, and non-critical API calls off the request path entirely-fire these into background jobs after sending the response. Cache expensive computations in Redis or Memcached with reasonable TTLs rather than recomputing on every request.
High TTFB is often caused by network latency rather than cold starts. A user in Tokyo connecting to a us-east-1 server faces 150-200ms of unavoidable network latency before any computation even starts.
For truly global performance, deploy your origin servers in multiple regions (e.g., US West, US East, Europe, APAC) and use geo-routing to direct users to their nearest origin. This reduces the base latency before CDN caching even comes into play.
Use a CDN like Cloudflare, AWS CloudFront, or Fastly to cache static assets and responses globally. CDN TTFB of 20-50ms applies to cached content served from nearby edge locations. First requests or cache misses will fetch from your nearest origin server, making origin placement critical for global applications. For lightweight server-side logic like auth checks or redirects, edge compute platforms (Cloudflare Workers, Lambda@Edge) run in hundreds of cities worldwide, delivering <50ms TTFB globally.
On serverless platforms, the scale-to-zero problem is the primary cause of cold starts. Vercel's Pro and Enterprise plans automatically keep at least one function instance warm, eliminating cold starts for most traffic. AWS Lambda offers Provisioned Concurrency to guarantee warm instances for critical functions, or you can schedule warm-up invocations every few minutes to keep functions hot. Cloudflare Workers sidesteps this entirely using V8 isolates that have no meaningful cold start penalty. For platforms without built-in warming, implement self-warming by pinging critical endpoints every 5-10 minutes-weigh the cost of these invocations against the user experience gain.
The Edge Runtime (subset of Node.js) has significantly faster cold starts (typically <50ms):
Cold starts include JavaScript compilation overhead (50-200ms), plus runtime initialization, network setup, and connection establishment, totaling 200-1000ms+ depending on configuration. Platforms like Vercel cache compiled bytecode between invocations, skipping this step entirely on subsequent cold starts and reducing them by up to 27%. Keep your server bundle small through dynamic imports for code-split routes, aggressive tree-shaking to remove unused code, and pruning unnecessary dependencies from package.json. Avoid importing heavy libraries at module scope if they're only needed for specific routes-defer loading them until actually required rather than paying the startup cost on every cold start.
Deployments replace warm instances with cold ones, causing a cold start storm where every user hits uninitialized functions simultaneously. Mitigate this with rolling or gradual releases. Vercel automatically rolls out deployments gradually across instances, avoiding the storm. AWS Lambda supports gradual deployment with traffic shifting-route 10% of traffic to the new version, then 50%, then 100%, giving each batch time to warm up. Blue-green deployments keep the old version running until the new version is fully warmed and validated, then switch traffic over instantly.
Set explicit SLOs per route and region:
Alerting strategy: Use multi-window, multi-burn-rate alerts following Google SRE best practices: combine fast detection windows (1 hour + 5 minutes for urgent issues) with slower windows (6 hours + 36 minutes, 3 days + 6 hours) to catch both sudden spikes and gradual degradation while avoiding false positives. Exclude bots and low-volume routes from alerts to reduce noise. Track TTFB alongside cache hit rate, database query time, and deploy events so you can quickly determine whether a spike is from cold starts, database issues, cache misses, or a bad deployment. Segment metrics by route, region, and cache status to pinpoint the root cause-a spike in one region but not others suggests network or infrastructure issues in that region.
Example alert: "p95 TTFB for /dashboard exceeded 500ms in us-east-1 for 15 minutes (cache hit rate dropped from 80% to 20% after deploy)"
The blank page before the first byte is the worst UX moment. While you're improving server paths, mitigate the perceived cost: stream a minimal shell, show a lightweight header or skeleton from the edge, or render a text-first placeholder so users never face a pure blank.
Slow TTFB from cold starts and database latency isn't just a technical hiccup - it's a trust problem that hits before content even appears.
The deployment decision matters: Choose bare metal or VPS (AMD Ryzen 9950X3D) when you want consistent sub-100ms TTFB with zero cold starts, full control over your infrastructure, and dramatically lower costs for high-traffic applications (10TB+/month bandwidth, sustained compute). For high-traffic applications, bare metal is 10-100x cheaper than serverless. However, for low-traffic sites or development environments, serverless pay-per-use pricing may be more economical. The raw single-core performance delivers 2-5x faster rendering on every request, not just cold starts. Choose serverless (Vercel, AWS Lambda, Cloudflare Workers) when operational simplicity matters more than cost-you pay a premium for zero infrastructure management, automatic global scaling, and can tolerate occasional cold starts for ~1% of requests. Regardless of your choice, co-locate your database within 1-5ms ping of your application servers, use SSG/ISR aggressively to avoid server rendering entirely, and leverage CDN caching to turn dynamic responses into <50ms edge hits.
Accept what you cannot control: Some high TTFB is unavoidable. A user on a slow mobile network (3G, satellite, congested public WiFi) will experience 300-1000ms+ TTFB regardless of your server's performance-network latency and packet loss dominate the physics of their connection. Focus your optimization efforts on the median experience while accepting that the long tail of p99 will include users with fundamentally slow network conditions that no amount of server optimization can fix.
Treat TTFB as a first-class SLI: Alert when p95 drifts above 300ms and trace spikes to their cause (cold starts vs. database latency vs. network distance). A fast FCP or LCP starts with a fast first byte; keep TTFB under 100ms and the entire load experience feels instant and reliable.
Download our service guide to understand how
we can help you optimise your site speed