When to Rewrite vs Refactor
A decision framework for choosing between incremental refactoring and a full rewrite, based on system state, team context, and business constraints.
The rewrite question comes up in every long-lived codebase. The system is painful to work with, velocity has slowed, and someone suggests starting over. The decision is consequential: a wrong rewrite can set a team back by years, and a wrong refusal to rewrite can trap a team in a system that cannot evolve. Here is how I approach it.
The Default Is Refactor
Refactoring preserves working behavior while improving internal structure. It is incremental, reversible, and carries lower risk than a rewrite. My default position is always refactor unless there is a specific, compelling reason to rewrite.
See also: Refactoring a System Without Breaking Users.
The reasons this default exists:
- Working software contains encoded knowledge. Every bug fix, every edge case handler, every "weird" conditional exists because someone encountered a real scenario. A rewrite loses this accumulated knowledge.
- Rewrites take 2-3x longer than estimated. This is not a rule of thumb. It is a pattern I have observed consistently across projects, teams, and technology stacks.
- The team must maintain the old system during the rewrite. Development effort is split between the old system (which still needs bug fixes and features) and the new system. Neither gets full attention.
- Feature parity is a moving target. By the time the rewrite catches up to the old system's features, the old system has accumulated new features that the rewrite does not yet support.
Signals That a Rewrite Is Justified
Despite the default, there are situations where refactoring is not viable:
The technology platform is end-of-life. If the language, framework, or runtime is no longer maintained and has known security vulnerabilities that cannot be patched, a rewrite may be the only option. Refactoring within a dead ecosystem does not solve the underlying problem.
Related: Designing Event Schemas That Survive Product Changes.
The architecture fundamentally cannot meet a hard requirement. A single-tenant system that must become multi-tenant. A monolith that must support independent deployment of its components due to regulatory requirements. A synchronous pipeline that must process events in real-time. When the gap is architectural rather than implementational, refactoring may be insufficient.
The codebase cannot be tested. If the system has no test coverage, no separation of concerns, and the code is so entangled that adding tests requires rewriting the code anyway, the distinction between "refactor" and "rewrite" becomes semantic.
The team has zero domain experts for the current implementation. If everyone who understood the system has left and the current team cannot safely make changes, the system is already effectively unmaintained. A rewrite with the current team may be faster than reverse-engineering the existing system.
The Strangler Fig Pattern
When a rewrite is justified, I never do a big-bang replacement. The strangler fig pattern replaces the old system incrementally:
- Identify a bounded piece of functionality in the old system.
- Build the replacement for that piece using the new architecture.
- Route traffic to the new implementation, keeping the old one as fallback.
- Verify correctness and performance.
- Remove the old implementation for that piece.
- Repeat for the next piece.
This gives you the benefits of a rewrite (new architecture, clean code) with the risk profile of a refactor (incremental, reversible, always shippable).
The key constraint: you must be able to draw clean boundaries around pieces of functionality. If the old system is so entangled that you cannot isolate a piece without rewriting everything, the strangler fig pattern does not work. In that case, you may need to refactor enough to create boundaries before you can start the incremental rewrite.
Decision Framework
I use the following evaluation:
| Factor | Favors refactor | Favors rewrite |
|---|---|---|
| Codebase health | Tests exist, modules are separable | No tests, everything is coupled |
| Team knowledge | Team understands the existing system | Nobody understands the existing system |
| Platform viability | Platform is actively maintained | Platform is deprecated or insecure |
| Business tolerance | Cannot pause feature delivery | Can invest a quarter in infrastructure |
| Architectural fit | Architecture can evolve to meet needs | Architecture fundamentally cannot meet needs |
| Risk tolerance | Low (system is revenue-critical) | Higher (non-critical or has fallback) |
If the evaluation is mixed (some factors favor refactor, others favor rewrite), I default to refactor. The cost of an unnecessary refactor is wasted effort. The cost of a failed rewrite is a multi-month or multi-year setback.
Common Rewrite Mistakes
Reproducing the old system's mistakes. Without understanding why the old system was built the way it was, the new system often recapitulates the same decisions. I require the team to document the old system's design rationale before starting the rewrite.
Gold-plating the rewrite. "Since we are rewriting anyway, let's also add X, Y, and Z." Every additional feature increases the time to parity and the risk of never finishing. The rewrite's scope should be strictly limited to achieving feature parity with a better architecture.
Underestimating data migration. The old system's data model contains years of accumulated inconsistencies, edge cases, and schema variations. Migrating this data to a new model is often the hardest part of the rewrite and is frequently underestimated by a factor of 3 or more.
Neglecting the transition period. During the transition, users interact with both old and new systems. Data must be synchronized. Bugs must be fixed in both systems. Support teams must understand both systems. The operational cost of the transition period is significant and must be planned for.
Refactoring Strategies That Avoid Rewrites
When the system is painful but does not justify a rewrite, these strategies provide relief:
Extract and replace modules. Identify the most painful module, define its interface, build a replacement behind the interface, and swap it in. This is a mini-rewrite within a refactor.
Add a testing layer. Before changing anything, add characterization tests that capture the current behavior. These tests allow refactoring with confidence that behavior is preserved.
Introduce an anti-corruption layer. When the old system's internal model is problematic, add a translation layer at the boundary. New code interacts with a clean model. The anti-corruption layer translates between the clean model and the legacy model.
Pay down debt incrementally. Allocate a fixed percentage of each sprint (15-20%) to technical debt reduction. This is sustainable and avoids the feast-or-famine cycle of debt accumulation followed by emergency rewrites.
The Rewrite Checklist
If the decision is to rewrite, verify these conditions before starting:
- The team has documented why the current system cannot be refactored
- The scope is limited to feature parity (no new features in v1)
- A strangler fig or incremental approach is planned (no big-bang cutover)
- Data migration has been prototyped with production-like data
- The team has capacity to maintain the old system during the transition
- Success criteria and timeline are defined (with a kill date if targets are not met)
- Stakeholders understand and accept the investment and the risk
The kill date is critical. A rewrite that is 18 months in with no end in sight should be evaluated for cancellation. Continuing is not always the right choice just because you have already invested.
Key Takeaways
- Default to refactoring. Rewrites take 2-3x longer than estimated and lose encoded knowledge from the existing system.
- Rewrites are justified when the platform is end-of-life, the architecture fundamentally cannot meet requirements, or the codebase cannot be tested.
- Use the strangler fig pattern for rewrites: incremental replacement, not big-bang cutover.
- Common rewrite mistakes: reproducing old design decisions, gold-plating, underestimating data migration, and neglecting the transition period.
- Allocate 15-20% of each sprint to technical debt reduction to avoid reaching the rewrite threshold.
- Set a kill date for rewrites and evaluate honestly whether to continue.
Further Reading
- How I Think About Engineering Risk: A framework for identifying, categorizing, and managing engineering risk across system design, team dynamics, and operational decisions.
- Making Trade-offs That Age Well: How to evaluate architectural trade-offs not just for current requirements but for how they will hold up as the system, team, and busines...
- Designing Systems for Humans, Not Just Machines: Why the human factors in system design, including cognitive load, operational ergonomics, and team structure, matter as much as the techn...
Final Thoughts
The rewrite temptation is strongest when frustration is highest. That is exactly when the decision should be most carefully evaluated. The best engineering teams I have worked with treat the rewrite question with the same rigor as any other architectural decision: defined criteria, documented trade-offs, and a clear plan for managing risk.
Recommended
Designing an Offline-First Sync Engine for Mobile Apps
A deep dive into building a reliable sync engine that keeps mobile apps functional without connectivity, covering conflict resolution, queue management, and real-world trade-offs.
Jetpack Compose Recomposition: A Deep Dive
A detailed look at how Compose recomposition works under the hood, what triggers it, how the slot table tracks state, and how to control it in production apps.
Event Tracking System Design for Android Applications
A systems-level breakdown of designing an event tracking system for Android, covering batching, schema enforcement, local persistence, and delivery guarantees.