Design Trade-offs I'd Make Differently Today
A retrospective on architectural decisions that seemed right at the time but aged poorly, and what I would choose instead with the benefit of hindsight.
Every system I have built carries decisions I would reverse if given the chance. Not because they were wrong at the time, but because the constraints shifted, the team changed, or the scale moved in a direction nobody predicted. This post is a catalog of those decisions and the reasoning I would apply today.
Premature Microservices
The most expensive architectural mistake I have made more than once: splitting a monolith into services before the domain boundaries were clear. The result was a distributed monolith with all the operational complexity of microservices and none of the independence.
What I would do differently: keep the monolith longer. Use module boundaries internally. Only extract a service when you can deploy it independently and the team boundary justifies the network hop.
| Signal to split | Signal to stay monolithic |
|---|---|
| Independent deploy cadence needed | Shared database transactions required |
| Separate team owns the domain | Domain boundaries still shifting |
| Different scaling characteristics | Team is under 10 engineers |
| Regulatory isolation required | No operational maturity for distributed systems |
Choosing Eventual Consistency Too Early
See also: Designing Event Schemas That Survive Product Changes.
Related: How I'd Design a Mobile Configuration System at Scale.
I once designed an order processing pipeline with eventual consistency because "we might need to scale." The system handled a few thousand orders per day. The debugging cost of stale reads and race conditions far exceeded any performance benefit.
Today I start with strong consistency and relax it only when measurements prove it necessary. The cognitive overhead of reasoning about eventual consistency is real and compounds across the team.
Custom Frameworks Over Standard Libraries
Building internal frameworks felt productive. We had a custom ORM, a custom HTTP client wrapper, a custom configuration library. Each one was "better" than the open-source alternative in some narrow way. Each one also had exactly one person who understood it deeply, and that person eventually left.
The trade-off I missed: maintenance cost is not just about code quality. It is about how many engineers can contribute fixes without a knowledge transfer session.
Over-Indexing on DRY
I used to extract shared code aggressively. Two services need the same validation logic? Shared library. Three endpoints parse dates the same way? Utility module. The result was tight coupling disguised as code reuse.
What I learned: duplication is cheaper than the wrong abstraction. When you deduplicate prematurely, you create a coupling point that makes independent evolution impossible. I now tolerate duplication for at least two iterations before considering extraction, and only extract when the abstraction boundary is genuinely stable.
Relying on Synchronous Chains
A request that touches five services synchronously is a request with five points of failure. I built systems like this because synchronous calls are simple to reason about and easy to trace. But the tail latency and cascading failure modes were brutal.
Today I design for asynchronous communication by default for anything that does not need an immediate user-facing response. The mental model shift is significant: instead of "call and wait," think "emit and react."
Insufficient Schema Evolution Strategy
Early in my career, I treated database schemas as implementation details. Migrations were ad-hoc. Column renames happened in single deploys. This worked until the first time a migration locked a table for 40 minutes in production.
Now I plan schema changes as multi-phase operations:
- Add the new column (nullable, no constraints)
- Backfill data, dual-write from application
- Switch reads to new column
- Remove old column after verification period
This is slower. It is also the only approach that does not risk downtime at scale.
Ignoring Operational Cost in Design Reviews
I spent years optimizing for developer experience during the build phase and ignoring the operational cost during the run phase. A system that is elegant to write but painful to operate is a net negative.
The shift: every design review now includes an "operational readiness" section. How do you deploy it? How do you roll it back? What alerts exist? What is the runbook? If these questions do not have answers before the first commit, the design is incomplete.
The Pattern I Follow Now
When facing a design trade-off today, I run through this checklist:
- Reversibility: Can I undo this decision in under a week? If yes, bias toward action. If no, invest more in analysis.
- Blast radius: If this decision is wrong, what breaks? A single endpoint or the entire platform?
- Team context: Will the median engineer on this team in 18 months understand why this choice was made?
- Operational burden: Does this add a new thing to monitor, a new failure mode, or a new on-call runbook entry?
Key Takeaways
- Premature decomposition into microservices creates distributed monoliths that are worse than the original monolith.
- Strong consistency should be the default. Relax it only when you have data justifying the trade-off.
- Shared libraries create coupling. Tolerate duplication until abstraction boundaries stabilize.
- Every architectural decision has an operational cost that compounds over time.
- Reversibility is the most underrated property of a design decision.
Further Reading
- Making Trade-offs That Age Well: How to evaluate architectural trade-offs not just for current requirements but for how they will hold up as the system, team, and busines...
- Event Tracking System Design for Android Applications: A systems-level breakdown of designing an event tracking system for Android, covering batching, schema enforcement, local persistence, an...
- Failure Modes I Actively Design For: A catalog of failure modes that experienced engineers anticipate and design around, from cascading failures to data corruption to clock s...
Final Thoughts
Hindsight is not the point of this exercise. The goal is to build a decision framework that accounts for the failure modes I have already encountered. Every system I build today carries the scar tissue of past mistakes, and that scar tissue is the most valuable architectural input I have.
Recommended
Designing an Offline-First Sync Engine for Mobile Apps
A deep dive into building a reliable sync engine that keeps mobile apps functional without connectivity, covering conflict resolution, queue management, and real-world trade-offs.
Jetpack Compose Recomposition: A Deep Dive
A detailed look at how Compose recomposition works under the hood, what triggers it, how the slot table tracks state, and how to control it in production apps.
Event Tracking System Design for Android Applications
A systems-level breakdown of designing an event tracking system for Android, covering batching, schema enforcement, local persistence, and delivery guarantees.