Benchmarking Database Writes Under Load

Dhruval Dhameliya·September 24, 2025·7 min read

Measured write throughput and latency for Postgres under increasing concurrency, comparing single inserts, batch inserts, COPY, and async writes.

Context

I needed to determine the maximum sustained write throughput for a Postgres 15 instance handling event ingestion. The workload: inserting rows into a table with 8 columns, 2 indexes, and a JSONB field. I tested four write strategies under increasing concurrency to find where each breaks.

Related: Event Tracking System Design for Android Applications.

See also: Designing Event Schemas That Survive Product Changes.

Problem

Write throughput numbers in documentation and blog posts rarely match production conditions. They test against tables with no indexes, no constraints, and no concurrent readers. Real workloads have all three. I wanted numbers that reflect a realistic schema under realistic contention.

Constraints

  • Database: Postgres 15 on Neon (4 vCPU, 16GB RAM)
  • Table schema: 8 columns (BIGSERIAL, 3 TEXT, 2 TIMESTAMPTZ, 1 INTEGER, 1 JSONB)
  • Indexes: 2 (primary key + composite index on two TEXT columns)
  • Foreign keys: 1 (referencing a 100K-row lookup table)
  • Concurrent readers: 10 read queries/second against the same table during the test
  • Test duration: 5 minutes per configuration, with 1-minute warmup

Design

Table Schema

CREATE TABLE events (
  id BIGSERIAL PRIMARY KEY,
  event_type TEXT NOT NULL,
  source TEXT NOT NULL,
  payload JSONB NOT NULL DEFAULT '{}',
  user_id TEXT NOT NULL,
  severity INTEGER NOT NULL DEFAULT 0,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  processed_at TIMESTAMPTZ,
  FOREIGN KEY (event_type) REFERENCES event_types(name)
);
 
CREATE INDEX idx_events_source_type ON events (source, event_type);

Write Strategies Tested

1. Single INSERT: One row per statement, one statement per transaction.

INSERT INTO events (event_type, source, payload, user_id, severity)
VALUES ($1, $2, $3, $4, $5);

2. Batch INSERT: Multiple rows per statement, one statement per transaction.

INSERT INTO events (event_type, source, payload, user_id, severity)
VALUES ($1, $2, $3, $4, $5),
       ($6, $7, $8, $9, $10),
       ...

3. COPY: Postgres COPY protocol for bulk loading.

COPY events (event_type, source, payload, user_id, severity)
FROM STDIN WITH (FORMAT csv);

4. Async writes with unlogged buffer: Write to an unlogged table, then flush to the main table periodically.

Concurrency Levels

Each strategy was tested at 1, 5, 10, 25, 50, and 100 concurrent writers.

Trade-offs

Throughput Results (rows/second, sustained over 5 minutes)

ConcurrencySingle INSERTBatch (100 rows)Batch (1000 rows)COPY
185012,00028,00045,000
53,20048,00095,000140,000
105,10072,000130,000180,000
256,80085,000145,000195,000
505,40068,000120,000170,000
1003,10042,00080,000110,000

Latency Results (ms per write operation, p95)

ConcurrencySingle INSERTBatch (100 rows)Batch (1000 rows)COPY
11.283522
52.8125238
105.5228560
251545180130
503582420290
10085180850550

Key observations:

  • Peak throughput occurs at 25 concurrent writers, not at maximum concurrency. Beyond 25, lock contention and I/O scheduling overhead reduce throughput.
  • Batch inserts at 100 rows are the sweet spot for most applications. The 12x improvement over single inserts comes with manageable latency (45ms p95 at 25 concurrency).
  • COPY outperforms batch INSERT by 35-50% but requires a different protocol and does not support RETURNING clauses or trigger execution.
  • At 100 concurrent writers, everything degrades. Single inserts drop to 3,100 rows/second, below the single-writer throughput. Lock contention dominates.

Index Overhead

I re-ran the batch-100 test with and without indexes:

ConfigurationThroughput at 10 concurrency
No indexes (heap only)145,000 rows/s
Primary key only110,000 rows/s
PK + 1 composite index72,000 rows/s
PK + 2 indexes52,000 rows/s
PK + 2 indexes + FK constraint48,000 rows/s

Each additional index reduces write throughput by 20-35%. The foreign key constraint adds another 8% overhead due to the lookup validation.

WAL Impact

Write-Ahead Log (WAL) generation directly correlates with write throughput:

StrategyWAL generated/minute at peak
Single INSERT120MB
Batch (100)380MB
Batch (1000)520MB
COPY680MB

High WAL generation affects replication lag. If you have a read replica, sustained high writes will increase replication delay proportionally.

Failure Modes

Transaction ID wraparound: Sustained high write rates consume transaction IDs faster. At 100,000 writes/second, autovacuum must keep pace or the database will eventually refuse writes to prevent wraparound. Monitor age(datfrozenxid).

Checkpoint spikes: Postgres checkpoints flush dirty buffers to disk. During a checkpoint, write latency spikes 2-5x. The default checkpoint interval is 5 minutes. For sustained write workloads, increase checkpoint_completion_target to 0.9 to spread the I/O.

Connection pool starvation during batch writes: A batch write holding a connection for 200ms at 50 concurrency means 50 connections are occupied for 200ms each. If the pool has 50 connections, no read queries can execute during that window.

Replication lag cascade: High WAL generation causes replicas to fall behind. If a read replica is used for read queries, stale reads become more stale. If the lag exceeds max_standby_streaming_delay, the replica disconnects and must resync.

Scaling Considerations

  • Vertical scaling (more CPU, RAM) provides linear improvement up to I/O saturation. Beyond that, disk throughput is the ceiling.
  • Horizontal write scaling requires sharding. Postgres does not natively support multi-writer configurations.
  • For sustained write rates above 100,000 rows/second, consider append-only tables without indexes, with periodic index rebuilds during off-peak hours.
  • Partitioning by time reduces index size and improves insert performance for time-series data.
  • If WAL generation exceeds disk throughput, writes will stall. Monitor pg_stat_wal for WAL write latency.

Observability

  • pg_stat_user_tables: track n_tup_ins, n_tup_upd, n_tup_del rates
  • pg_stat_bgwriter: monitor buffers_checkpoint and checkpoint_write_time for checkpoint impact
  • pg_stat_wal: WAL generation rate and write latency
  • Application-side: track batch size distribution, write latency percentiles, and connection wait time
  • Alert on autovacuum falling behind (n_dead_tup growing faster than vacuuming)

Key Takeaways

  • Batch inserts at 100 rows per statement deliver 12x throughput over single inserts with acceptable latency. This is the highest-impact optimization for write-heavy workloads.
  • Peak throughput occurs at moderate concurrency (10-25 writers), not at maximum concurrency. Over-parallelizing writes hurts performance.
  • Each index reduces write throughput by 20-35%. Only index what you query. Remove indexes you do not use.
  • COPY is the fastest option but is impractical for application-level writes. Reserve it for bulk data loading and migrations.
  • Monitor WAL generation and autovacuum progress. These are the leading indicators of write-related failures.

Further Reading

Final Thoughts

The most impactful change in this benchmark was switching from single inserts to batch inserts of 100 rows. This single change delivered 12x throughput improvement and required minimal code changes (accumulate rows in a buffer, flush when the buffer reaches 100 or a timeout expires). Everything else, including COPY, unlogged tables, and index removal, provides diminishing returns with increasing complexity. Start with batching.

Recommended