Why Simple Systems Scale Better
An argument for architectural simplicity as a scaling strategy, with examples of how complexity creates bottlenecks that simple designs avoid.
The systems that scale best are not the ones with the most sophisticated architecture. They are the ones where a new engineer can read the code, understand the data flow, and make changes without fear. Simplicity is not a constraint on scaling. It is a prerequisite.
Complexity as a Scaling Bottleneck
Scaling a system requires changing it. Adding capacity, sharding data, splitting traffic, optimizing hot paths. Every one of these changes requires understanding the existing system well enough to modify it safely.
Complex systems resist change. They have hidden dependencies, implicit assumptions, and emergent behaviors that are not documented because nobody fully understands them. The result: scaling efforts take longer, carry more risk, and often introduce new problems.
I have seen a team spend six months trying to shard a database that was entangled with application logic through stored procedures, triggers, and cross-schema joins. A simpler schema design would have made the sharding operation a two-week project.
What Simple Means in Practice
Simple does not mean naive or under-designed. It means:
- Few moving parts. Each component does one thing. The interaction between components is explicit.
- Predictable behavior. Given the same input, the system produces the same output. Side effects are contained and documented.
- Linear data flow. Data moves in one direction through the system. Feedback loops and circular dependencies are eliminated or made explicit.
- Minimal shared state. Components communicate through well-defined interfaces, not shared databases or global caches.
See also: Building a Minimal Feature Flag Service.
The Hidden Cost of Clever Solutions
Clever solutions optimize for one dimension (usually performance) at the expense of comprehensibility. A custom lock-free data structure might handle 10x the throughput of a standard concurrent hash map, but it also requires 10x the expertise to debug, modify, and operate.
| Clever approach | Simple alternative | When clever wins |
|---|---|---|
| Custom memory allocator | Standard allocator with pooling | Sub-millisecond latency requirements |
| Lock-free concurrent structures | Mutex-protected standard structures | Proven contention bottleneck at scale |
| Custom serialization format | Protocol Buffers or JSON | Extreme bandwidth constraints |
| In-process cache with custom eviction | Redis or Memcached | Network latency is the measured bottleneck |
| Hand-optimized SQL | ORM with query monitoring | Specific queries proven slow by profiling |
The "when clever wins" column is important. These are not hypothetical thresholds. They are specific, measured conditions that justify the complexity. Without measurement, the simple approach wins by default.
Related: Designing Event Schemas That Survive Product Changes.
Case Study: The Overengineered Event Pipeline
A team I worked with built an event processing pipeline with the following components: a custom message format, a schema registry, a custom serializer, an event router with pluggable strategies, a dead letter queue with automatic replay, and a monitoring dashboard. This was for processing about 500 events per second.
The pipeline worked, but modifying it required understanding all six components and their interactions. Adding a new event type took a week of development and testing. When it broke at 3 AM, the on-call engineer needed 45 minutes just to identify which component had failed.
We replaced it with a standard message broker, JSON serialization, and a single consumer service with straightforward error handling. Adding a new event type became a two-hour task. On-call diagnosis dropped to under 10 minutes. And the system handled 5,000 events per second without any optimization because we could now easily identify and fix the actual bottlenecks.
Simplicity Enables Horizontal Scaling
The simplest scaling strategy is horizontal: run more instances of the same thing behind a load balancer. This works when the system is stateless and the components are independent. It fails when there is shared state, global ordering requirements, or implicit coordination between instances.
Simple systems tend toward statelessness naturally. When you minimize shared state and make data flow linear, each instance can operate independently. Scaling becomes a capacity planning exercise rather than an architecture project.
Complex systems resist horizontal scaling because their components are tightly coupled. Scaling one component requires scaling its dependencies, which requires scaling their dependencies. The scaling unit is the entire system, not individual components.
Simplicity and Operational Scaling
Systems do not just scale in throughput. They scale in operational burden: more alerts, more dashboards, more runbooks, more on-call rotations. Simple systems have fewer failure modes, which means fewer alerts, shorter runbooks, and faster incident resolution.
A system with 5 components and well-defined failure modes might need 10 alerts. A system with 20 components and emergent failure modes might need 100 alerts. But alert fatigue sets in around 20. The complex system generates so much noise that real signals get lost.
The Simplicity Discipline
Keeping systems simple requires active effort. Complexity is the natural state of evolving software. Without deliberate resistance, every feature request, every performance optimization, and every edge case adds complexity.
Practices I follow:
- Regularly remove unused code. Dead code is not free. It confuses new engineers and adds to the cognitive load of the codebase.
- Resist premature generalization. Build for the current requirements. Generalize only when you have three concrete use cases.
- Document the "why not." When you choose a simple approach over a complex one, document the reasoning. Future engineers will be tempted to add complexity unless they understand why it was avoided.
- Measure before optimizing. Most performance optimizations add complexity. Ensure the optimization targets a measured bottleneck, not a hypothetical one.
Key Takeaways
- Complexity is a scaling bottleneck because it makes the system harder to change, and scaling requires change.
- Simple systems enable horizontal scaling naturally by minimizing shared state and keeping components independent.
- Clever solutions should be justified by specific measurements, not hypothetical performance concerns.
- Operational scaling (alerts, runbooks, on-call burden) is as important as throughput scaling.
- Simplicity requires active maintenance. Without deliberate effort, complexity accumulates.
Further Reading
- Experimenting With Background Workers at Scale: Testing job queue architectures with BullMQ, Postgres-based queues, and SQS under increasing job volumes, with failure handling and scali...
- How I'd Design a Mobile Configuration System at Scale: Designing a configuration system for mobile apps at scale, covering config delivery, caching layers, override hierarchies, and safe rollo...
- What Breaks First When Traffic Scales: A catalog of components that fail first under increasing traffic, ordered by how commonly they become bottlenecks in web applications.
Final Thoughts
The instinct to build complex systems comes from a good place. Engineers want to handle every edge case, optimize every path, and prepare for every future requirement. But the systems that survive and scale are the ones where the complexity is proportional to the actual requirements, not the imagined ones. Simplicity is not about doing less. It is about doing the right amount.
Recommended
Designing an Offline-First Sync Engine for Mobile Apps
A deep dive into building a reliable sync engine that keeps mobile apps functional without connectivity, covering conflict resolution, queue management, and real-world trade-offs.
Jetpack Compose Recomposition: A Deep Dive
A detailed look at how Compose recomposition works under the hood, what triggers it, how the slot table tracks state, and how to control it in production apps.
Event Tracking System Design for Android Applications
A systems-level breakdown of designing an event tracking system for Android, covering batching, schema enforcement, local persistence, and delivery guarantees.