Measuring Cold Starts Across Different Architectures

Context

Cold starts are the latency tax for serverless. I measured cold start durations across four deployment targets using identical application logic: a JSON API that reads from a Postgres database, processes the result, and returns a response. The goal was to produce actionable numbers, not vendor comparisons.

Problem

Cold start documentation from cloud providers is vague. "Typically under 1 second" is not useful for capacity planning. Teams need percentile distributions under realistic conditions to make informed architecture decisions.

Constraints

Application: Node.js 20 runtime, ~2MB bundle size after tree-shaking
Database: Neon Postgres (serverless), single region
Test methodology: 1,000 cold start invocations per platform over 48 hours
Cold start trigger: wait 15+ minutes between invocations to ensure function eviction
Measurement: timestamp difference between invocation start and first response byte

Design

Test Setup

Each deployment ran the same handler:

export async function handler(event) {
  const start = performance.now();
  const client = new Pool({ connectionString: process.env.DATABASE_URL });
  const result = await client.query('SELECT id, title FROM posts LIMIT 10');
  await client.end();
  const duration = performance.now() - start;
 
  return {
    statusCode: 200,
    headers: { 'Server-Timing': `handler;dur=${duration}` },
    body: JSON.stringify(result.rows),
  };
}

Platforms tested:

Platform	Runtime	Region	Memory
AWS Lambda	Node.js 20	us-east-1	512MB
Vercel Functions	Node.js 20	us-east-1	1024MB
Cloudflare Workers	V8 isolate	Global (anycast)	128MB
Cloud Run (container)	Node.js 20	us-east-1	512MB

Measurement Methodology

For each platform, I deployed a cron job that:

Waited 20 minutes (ensuring eviction)
Sent a single request and recorded full response time
Extracted Server-Timing header for handler-only duration
Computed cold start overhead as: total response time minus handler duration

This isolated the platform initialization from the application logic.

Trade-offs

Cold Start Latency (ms)

Platform	p50	p75	p95	p99	Max
AWS Lambda	680	820	1,100	1,450	2,200
Vercel Functions	720	880	1,200	1,600	2,500
Cloudflare Workers	12	18	28	45	80
Cloud Run (min instances: 0)	2,400	3,100	4,200	5,800	8,000
Cloud Run (min instances: 1)	8	12	22	35	60

Key findings:

Cloudflare Workers are 50x faster on cold start than Lambda. V8 isolates initialize in milliseconds. Full Node.js runtimes take hundreds of milliseconds.
Cloud Run with zero minimum instances is the worst performer. Container pull, runtime initialization, and dependency loading add up to 2-5 seconds.
Cloud Run with minimum instances eliminates cold starts entirely but keeps a container running (cost: ~$5-15/month per idle instance).
Lambda and Vercel are comparable. Vercel Functions run on AWS Lambda under the hood, so this is expected.

Cold Start Breakdown (Lambda, p50)

Phase	Duration (ms)
Runtime initialization	180
Module loading	220
Database connection setup	240
Handler execution	40
Total	680

Database connection setup is 35% of the cold start. Using a connection pooler (Neon's built-in proxy) reduced this to 80ms, bringing the total cold start p50 to 520ms.

Bundle Size Impact on Cold Start

Bundle Size	Lambda p50 Cold Start
500KB	450ms
2MB	680ms
5MB	980ms
10MB	1,400ms
25MB	2,200ms

Bundle size has a near-linear relationship with cold start duration on Lambda. Every 1MB adds approximately 50-60ms.

Failure Modes

Provisioned concurrency exhaustion on Lambda: If traffic exceeds provisioned concurrency, excess requests hit cold starts. There is no gradual degradation. Requests either get a warm instance or a full cold start.

Cloudflare Workers CPU time limits: Workers have a 10ms CPU time limit on the free plan, 30ms on paid. Complex initialization logic can exceed this, causing the worker to be killed. This is not a cold start issue per se, but it constrains what you can do during initialization.

Cloud Run scale-to-zero race condition: When traffic arrives after a period of inactivity, the first request waits for container startup. If multiple requests arrive simultaneously, Cloud Run may start multiple containers, leading to over-provisioning followed by scale-down, followed by more cold starts.

Database connection pool during cold start: If 50 Lambda functions cold-start simultaneously, each opens a new database connection. Without a connection pooler, this can exhaust the database connection limit instantly.

Scaling Considerations

Lambda provisioned concurrency costs $0.015 per GB-hour. For 10 instances at 512MB, that is ~$55/month. Compare this against the user experience cost of cold starts.
Cloudflare Workers scale effectively to thousands of concurrent requests without cold start penalties, but the V8 isolate environment limits available APIs (no native Node.js modules, no file system access).
Cloud Run with minimum instances is the safest option for container-based deployments. The cost of one idle container is typically negligible compared to the latency cost of cold starts.
For latency-critical paths (authentication, payment processing), cold starts are unacceptable. Use provisioned concurrency or always-on instances for these routes.

Observability

Lambda: Enable X-Ray tracing and filter for Initialization segment duration
Vercel: Parse x-vercel-id header and function logs for COLD_START markers
Cloudflare Workers: Use wrangler tail and filter for initialization events
Cloud Run: Monitor container/startup_latencies metric in Cloud Monitoring
Cross-platform: instrument the handler with Server-Timing headers and track cold start ratio (cold starts / total invocations)

Target cold start ratio: under 1% for production workloads. If cold starts exceed 5% of requests, either increase provisioned concurrency or reconsider the architecture.

Key Takeaways

V8 isolates (Cloudflare Workers) have effectively zero cold starts. If your application fits within the Workers runtime constraints, cold starts are a non-issue.
Lambda cold starts are dominated by module loading and database connection setup. Minimize bundle size and use connection poolers.
Cloud Run with zero minimum instances is unsuitable for latency-sensitive workloads. Container startup adds 2-5 seconds.
Bundle size directly impacts cold start duration. Every megabyte matters. Tree-shake aggressively and lazy-load non-critical dependencies.
Provisioned concurrency is a cost decision, not a technical one. The math is: cost of provisioned instances vs. cost of user-facing latency.

Final Thoughts

The numbers tell a clear story. For latency-critical serverless applications, either use V8 isolates (if the runtime constraints fit) or budget for provisioned concurrency. Hoping that cold starts will not affect users is not a strategy. Measure your specific workload, compute the cold start ratio, and make the cost trade-off explicitly.