Measuring Cold Starts Across Different Architectures

Dhruval Dhameliya·November 11, 2025·7 min read

Cold start latency measurements across AWS Lambda, Vercel Functions, Cloudflare Workers, and containerized deployments with concrete numbers.

Context

Cold starts are the latency tax for serverless. I measured cold start durations across four deployment targets using identical application logic: a JSON API that reads from a Postgres database, processes the result, and returns a response. The goal was to produce actionable numbers, not vendor comparisons.

Problem

Cold start documentation from cloud providers is vague. "Typically under 1 second" is not useful for capacity planning. Teams need percentile distributions under realistic conditions to make informed architecture decisions.

Constraints

  • Application: Node.js 20 runtime, ~2MB bundle size after tree-shaking
  • Database: Neon Postgres (serverless), single region
  • Test methodology: 1,000 cold start invocations per platform over 48 hours
  • Cold start trigger: wait 15+ minutes between invocations to ensure function eviction
  • Measurement: timestamp difference between invocation start and first response byte

Design

Test Setup

Each deployment ran the same handler:

export async function handler(event) {
  const start = performance.now();
  const client = new Pool({ connectionString: process.env.DATABASE_URL });
  const result = await client.query('SELECT id, title FROM posts LIMIT 10');
  await client.end();
  const duration = performance.now() - start;
 
  return {
    statusCode: 200,
    headers: { 'Server-Timing': `handler;dur=${duration}` },
    body: JSON.stringify(result.rows),
  };
}

Platforms tested:

See also: Event Tracking System Design for Android Applications.

PlatformRuntimeRegionMemory
AWS LambdaNode.js 20us-east-1512MB
Vercel FunctionsNode.js 20us-east-11024MB
Cloudflare WorkersV8 isolateGlobal (anycast)128MB
Cloud Run (container)Node.js 20us-east-1512MB

Measurement Methodology

For each platform, I deployed a cron job that:

  1. Waited 20 minutes (ensuring eviction)
  2. Sent a single request and recorded full response time
  3. Extracted Server-Timing header for handler-only duration
  4. Computed cold start overhead as: total response time minus handler duration

This isolated the platform initialization from the application logic.

Trade-offs

Cold Start Latency (ms)

Platformp50p75p95p99Max
AWS Lambda6808201,1001,4502,200
Vercel Functions7208801,2001,6002,500
Cloudflare Workers1218284580
Cloud Run (min instances: 0)2,4003,1004,2005,8008,000
Cloud Run (min instances: 1)812223560

Key findings:

  • Cloudflare Workers are 50x faster on cold start than Lambda. V8 isolates initialize in milliseconds. Full Node.js runtimes take hundreds of milliseconds.
  • Cloud Run with zero minimum instances is the worst performer. Container pull, runtime initialization, and dependency loading add up to 2-5 seconds.
  • Cloud Run with minimum instances eliminates cold starts entirely but keeps a container running (cost: ~$5-15/month per idle instance).
  • Lambda and Vercel are comparable. Vercel Functions run on AWS Lambda under the hood, so this is expected.

Cold Start Breakdown (Lambda, p50)

PhaseDuration (ms)
Runtime initialization180
Module loading220
Database connection setup240
Handler execution40
Total680

Database connection setup is 35% of the cold start. Using a connection pooler (Neon's built-in proxy) reduced this to 80ms, bringing the total cold start p50 to 520ms.

Bundle Size Impact on Cold Start

Bundle SizeLambda p50 Cold Start
500KB450ms
2MB680ms
5MB980ms
10MB1,400ms
25MB2,200ms

Bundle size has a near-linear relationship with cold start duration on Lambda. Every 1MB adds approximately 50-60ms.

Related: Failure Modes I Actively Design For.

Failure Modes

Provisioned concurrency exhaustion on Lambda: If traffic exceeds provisioned concurrency, excess requests hit cold starts. There is no gradual degradation. Requests either get a warm instance or a full cold start.

Cloudflare Workers CPU time limits: Workers have a 10ms CPU time limit on the free plan, 30ms on paid. Complex initialization logic can exceed this, causing the worker to be killed. This is not a cold start issue per se, but it constrains what you can do during initialization.

Cloud Run scale-to-zero race condition: When traffic arrives after a period of inactivity, the first request waits for container startup. If multiple requests arrive simultaneously, Cloud Run may start multiple containers, leading to over-provisioning followed by scale-down, followed by more cold starts.

Database connection pool during cold start: If 50 Lambda functions cold-start simultaneously, each opens a new database connection. Without a connection pooler, this can exhaust the database connection limit instantly.

Scaling Considerations

  • Lambda provisioned concurrency costs $0.015 per GB-hour. For 10 instances at 512MB, that is ~$55/month. Compare this against the user experience cost of cold starts.
  • Cloudflare Workers scale effectively to thousands of concurrent requests without cold start penalties, but the V8 isolate environment limits available APIs (no native Node.js modules, no file system access).
  • Cloud Run with minimum instances is the safest option for container-based deployments. The cost of one idle container is typically negligible compared to the latency cost of cold starts.
  • For latency-critical paths (authentication, payment processing), cold starts are unacceptable. Use provisioned concurrency or always-on instances for these routes.

Observability

  • Lambda: Enable X-Ray tracing and filter for Initialization segment duration
  • Vercel: Parse x-vercel-id header and function logs for COLD_START markers
  • Cloudflare Workers: Use wrangler tail and filter for initialization events
  • Cloud Run: Monitor container/startup_latencies metric in Cloud Monitoring
  • Cross-platform: instrument the handler with Server-Timing headers and track cold start ratio (cold starts / total invocations)

Target cold start ratio: under 1% for production workloads. If cold starts exceed 5% of requests, either increase provisioned concurrency or reconsider the architecture.

Key Takeaways

  • V8 isolates (Cloudflare Workers) have effectively zero cold starts. If your application fits within the Workers runtime constraints, cold starts are a non-issue.
  • Lambda cold starts are dominated by module loading and database connection setup. Minimize bundle size and use connection poolers.
  • Cloud Run with zero minimum instances is unsuitable for latency-sensitive workloads. Container startup adds 2-5 seconds.
  • Bundle size directly impacts cold start duration. Every megabyte matters. Tree-shake aggressively and lazy-load non-critical dependencies.
  • Provisioned concurrency is a cost decision, not a technical one. The math is: cost of provisioned instances vs. cost of user-facing latency.

Further Reading

Final Thoughts

The numbers tell a clear story. For latency-critical serverless applications, either use V8 isolates (if the runtime constraints fit) or budget for provisioned concurrency. Hoping that cold starts will not affect users is not a strategy. Measure your specific workload, compute the cold start ratio, and make the cost trade-off explicitly.

Recommended