Measuring Cold Starts Across Different Architectures
Cold start latency measurements across AWS Lambda, Vercel Functions, Cloudflare Workers, and containerized deployments with concrete numbers.
Context
Cold starts are the latency tax for serverless. I measured cold start durations across four deployment targets using identical application logic: a JSON API that reads from a Postgres database, processes the result, and returns a response. The goal was to produce actionable numbers, not vendor comparisons.
Problem
Cold start documentation from cloud providers is vague. "Typically under 1 second" is not useful for capacity planning. Teams need percentile distributions under realistic conditions to make informed architecture decisions.
Constraints
- Application: Node.js 20 runtime, ~2MB bundle size after tree-shaking
- Database: Neon Postgres (serverless), single region
- Test methodology: 1,000 cold start invocations per platform over 48 hours
- Cold start trigger: wait 15+ minutes between invocations to ensure function eviction
- Measurement: timestamp difference between invocation start and first response byte
Design
Test Setup
Each deployment ran the same handler:
export async function handler(event) {
const start = performance.now();
const client = new Pool({ connectionString: process.env.DATABASE_URL });
const result = await client.query('SELECT id, title FROM posts LIMIT 10');
await client.end();
const duration = performance.now() - start;
return {
statusCode: 200,
headers: { 'Server-Timing': `handler;dur=${duration}` },
body: JSON.stringify(result.rows),
};
}Platforms tested:
See also: Event Tracking System Design for Android Applications.
| Platform | Runtime | Region | Memory |
|---|---|---|---|
| AWS Lambda | Node.js 20 | us-east-1 | 512MB |
| Vercel Functions | Node.js 20 | us-east-1 | 1024MB |
| Cloudflare Workers | V8 isolate | Global (anycast) | 128MB |
| Cloud Run (container) | Node.js 20 | us-east-1 | 512MB |
Measurement Methodology
For each platform, I deployed a cron job that:
- Waited 20 minutes (ensuring eviction)
- Sent a single request and recorded full response time
- Extracted
Server-Timingheader for handler-only duration - Computed cold start overhead as: total response time minus handler duration
This isolated the platform initialization from the application logic.
Trade-offs
Cold Start Latency (ms)
| Platform | p50 | p75 | p95 | p99 | Max |
|---|---|---|---|---|---|
| AWS Lambda | 680 | 820 | 1,100 | 1,450 | 2,200 |
| Vercel Functions | 720 | 880 | 1,200 | 1,600 | 2,500 |
| Cloudflare Workers | 12 | 18 | 28 | 45 | 80 |
| Cloud Run (min instances: 0) | 2,400 | 3,100 | 4,200 | 5,800 | 8,000 |
| Cloud Run (min instances: 1) | 8 | 12 | 22 | 35 | 60 |
Key findings:
- Cloudflare Workers are 50x faster on cold start than Lambda. V8 isolates initialize in milliseconds. Full Node.js runtimes take hundreds of milliseconds.
- Cloud Run with zero minimum instances is the worst performer. Container pull, runtime initialization, and dependency loading add up to 2-5 seconds.
- Cloud Run with minimum instances eliminates cold starts entirely but keeps a container running (cost: ~$5-15/month per idle instance).
- Lambda and Vercel are comparable. Vercel Functions run on AWS Lambda under the hood, so this is expected.
Cold Start Breakdown (Lambda, p50)
| Phase | Duration (ms) |
|---|---|
| Runtime initialization | 180 |
| Module loading | 220 |
| Database connection setup | 240 |
| Handler execution | 40 |
| Total | 680 |
Database connection setup is 35% of the cold start. Using a connection pooler (Neon's built-in proxy) reduced this to 80ms, bringing the total cold start p50 to 520ms.
Bundle Size Impact on Cold Start
| Bundle Size | Lambda p50 Cold Start |
|---|---|
| 500KB | 450ms |
| 2MB | 680ms |
| 5MB | 980ms |
| 10MB | 1,400ms |
| 25MB | 2,200ms |
Bundle size has a near-linear relationship with cold start duration on Lambda. Every 1MB adds approximately 50-60ms.
Related: Failure Modes I Actively Design For.
Failure Modes
Provisioned concurrency exhaustion on Lambda: If traffic exceeds provisioned concurrency, excess requests hit cold starts. There is no gradual degradation. Requests either get a warm instance or a full cold start.
Cloudflare Workers CPU time limits: Workers have a 10ms CPU time limit on the free plan, 30ms on paid. Complex initialization logic can exceed this, causing the worker to be killed. This is not a cold start issue per se, but it constrains what you can do during initialization.
Cloud Run scale-to-zero race condition: When traffic arrives after a period of inactivity, the first request waits for container startup. If multiple requests arrive simultaneously, Cloud Run may start multiple containers, leading to over-provisioning followed by scale-down, followed by more cold starts.
Database connection pool during cold start: If 50 Lambda functions cold-start simultaneously, each opens a new database connection. Without a connection pooler, this can exhaust the database connection limit instantly.
Scaling Considerations
- Lambda provisioned concurrency costs $0.015 per GB-hour. For 10 instances at 512MB, that is ~$55/month. Compare this against the user experience cost of cold starts.
- Cloudflare Workers scale effectively to thousands of concurrent requests without cold start penalties, but the V8 isolate environment limits available APIs (no native Node.js modules, no file system access).
- Cloud Run with minimum instances is the safest option for container-based deployments. The cost of one idle container is typically negligible compared to the latency cost of cold starts.
- For latency-critical paths (authentication, payment processing), cold starts are unacceptable. Use provisioned concurrency or always-on instances for these routes.
Observability
- Lambda: Enable X-Ray tracing and filter for
Initializationsegment duration - Vercel: Parse
x-vercel-idheader and function logs forCOLD_STARTmarkers - Cloudflare Workers: Use
wrangler tailand filter for initialization events - Cloud Run: Monitor
container/startup_latenciesmetric in Cloud Monitoring - Cross-platform: instrument the handler with
Server-Timingheaders and track cold start ratio (cold starts / total invocations)
Target cold start ratio: under 1% for production workloads. If cold starts exceed 5% of requests, either increase provisioned concurrency or reconsider the architecture.
Key Takeaways
- V8 isolates (Cloudflare Workers) have effectively zero cold starts. If your application fits within the Workers runtime constraints, cold starts are a non-issue.
- Lambda cold starts are dominated by module loading and database connection setup. Minimize bundle size and use connection poolers.
- Cloud Run with zero minimum instances is unsuitable for latency-sensitive workloads. Container startup adds 2-5 seconds.
- Bundle size directly impacts cold start duration. Every megabyte matters. Tree-shake aggressively and lazy-load non-critical dependencies.
- Provisioned concurrency is a cost decision, not a technical one. The math is: cost of provisioned instances vs. cost of user-facing latency.
Further Reading
- Measuring the Cost of Abstractions: Benchmarking the runtime overhead of ORMs, validation libraries, middleware chains, and framework abstractions with concrete performance ...
- SSR vs SSG vs ISR in Next.js: What I Measured: Concrete latency, TTFB, and cache-hit measurements across SSR, SSG, and ISR rendering strategies in Next.js under realistic traffic.
- Measuring and Reducing Jank in Compose Apps: A systematic approach to identifying, measuring, and eliminating frame drops in Jetpack Compose applications, with concrete patterns and ...
Final Thoughts
The numbers tell a clear story. For latency-critical serverless applications, either use V8 isolates (if the runtime constraints fit) or budget for provisioned concurrency. Hoping that cold starts will not affect users is not a strategy. Measure your specific workload, compute the cold start ratio, and make the cost trade-off explicitly.
Recommended
Designing an Offline-First Sync Engine for Mobile Apps
A deep dive into building a reliable sync engine that keeps mobile apps functional without connectivity, covering conflict resolution, queue management, and real-world trade-offs.
Jetpack Compose Recomposition: A Deep Dive
A detailed look at how Compose recomposition works under the hood, what triggers it, how the slot table tracks state, and how to control it in production apps.
Event Tracking System Design for Android Applications
A systems-level breakdown of designing an event tracking system for Android, covering batching, schema enforcement, local persistence, and delivery guarantees.