Engineering Principles I Apply Across Domains
Universal engineering principles that hold true regardless of language, framework, domain, or scale, distilled from years of building production systems.
Some principles transcend the specific technology stack, domain, or team context. Whether I am building a payment processor, a real-time messaging system, or a data pipeline, these principles shape my decisions. They are not abstract ideals. They are practical heuristics that have consistently led to better outcomes.
Make the Common Case Fast and the Edge Case Correct
Every system has a happy path that handles 95% of traffic and edge cases that handle the remaining 5%. The design should optimize the happy path for performance and the edge cases for correctness.
Related: How I'd Design a Mobile Configuration System at Scale.
The mistake I see: optimizing edge cases for performance. A payment retry path that handles 0.1% of transactions does not need sub-millisecond latency. It needs to be correct, auditable, and idempotent. Spending engineering effort on its performance takes away from making the primary payment path faster.
Conversely, making the happy path "flexible" to handle every edge case degrades performance for the 95% of requests that do not need that flexibility.
The practical application: separate code paths for common and edge cases. The common path is lean, fast, and heavily optimized. The edge case path is thorough, well-logged, and prioritizes correctness over speed.
Fail Loudly, Recover Quietly
When something breaks, the failure should be immediately visible: alerts fire, errors are logged, dashboards turn red. Silent failures accumulate and compound.
When the system recovers, the recovery should be automatic and quiet. Circuit breakers close, retries succeed, caches refill, traffic shifts back. Recovery that requires human intervention for every incident does not scale.
The pattern:
- Detection: automated, aggressive, slightly over-sensitive (tune false positives later)
- Notification: immediate, to the right audience, with context
- Recovery: automated for known failure modes, manual only for novel situations
- Post-recovery verification: automated checks that confirm the system has returned to normal state
Measure, Then Optimize
I never optimize without measurement. Intuition about performance bottlenecks is wrong more often than it is right, even for experienced engineers.
The measurement protocol:
- Define the metric you are trying to improve (P99 latency, throughput, memory usage)
- Measure the current baseline under realistic load
- Profile to identify the actual bottleneck
- Implement the optimization targeting the measured bottleneck
- Measure again under the same conditions
- If the improvement is less than expected, investigate why
I have watched teams spend weeks optimizing a function that accounted for 2% of total latency while ignoring a database query that accounted for 60%. Profiling would have revealed this in minutes.
Design for the Median Engineer
The median engineer on the team in 18 months will not be the strongest engineer on the team today. Designs that require exceptional skill to maintain are designs that will degrade over time.
This applies to:
- Code patterns. If a pattern requires deep understanding of concurrency primitives to use correctly, most engineers will use it incorrectly. Choose a simpler pattern that is harder to misuse.
- Operational procedures. If a deployment requires 12 manual steps executed in the right order, someone will skip step 7. Automate the procedure or simplify it.
- Architecture. If understanding the data flow requires expertise in event sourcing, CQRS, and saga patterns simultaneously, the learning curve will slow down every new team member.
Designing for the median engineer is not condescending. It is realistic. The system must be operable by the team you have, not the team you wish you had.
Prefer Explicit Over Implicit
Implicit behavior is behavior that happens without being visible in the code at the point where it matters. Spring's dependency injection is implicit. Python decorators that modify function behavior are implicit. Database triggers are implicit.
Implicit behavior is convenient for the author and confusing for the reader. I bias toward explicit code because:
- Explicit code is searchable. You can grep for where something happens.
- Explicit code is debuggable. The stack trace shows the actual execution path.
- Explicit code is reviewable. Code reviewers can see what the code does without knowing framework conventions.
- Explicit code is portable. Understanding it does not require knowing which framework version is in use.
See also: Comparing Search Implementations: Client vs Server.
I do not avoid frameworks or conventions entirely. But when I use them, I ensure the implicit behavior is documented at the point of use and that the team understands the conventions.
Separate Policy from Mechanism
The mechanism is how something works. The policy is when and why it is used. Separating these makes both easier to change.
Examples:
| Mechanism | Policy |
|---|---|
| Circuit breaker implementation | When to open, how long to wait before half-open |
| Rate limiter | How many requests per window, per user vs. per IP |
| Cache | TTL values, eviction strategy, which data to cache |
| Retry logic | How many retries, backoff curve, which errors are retryable |
When mechanism and policy are mixed, changing the retry count requires modifying the retry implementation. When they are separated, changing the retry count is a configuration change.
Build for Testability From the Start
Testability is not an afterthought. It is a design property that affects architecture. Systems that are hard to test have specific structural problems:
- Hard-coded dependencies instead of injected ones
- Side effects mixed with business logic
- Global state that makes tests order-dependent
- External service calls without abstraction layers
I design for testability by:
- Injecting dependencies through constructors or factory methods
- Separating pure logic from I/O operations
- Avoiding global mutable state
- Providing test doubles for external dependencies
Code that is hard to test is also hard to change, hard to understand, and hard to debug. Testability is a proxy for overall design quality.
Automate Repetitive Decisions
Any decision that the team makes more than once a week should either be automated or documented as a decision tree.
Examples:
- "Should I retry this error?" becomes a retry policy implemented in code.
- "How do I name this API endpoint?" becomes a naming convention document.
- "How many instances should I run?" becomes an auto-scaling policy.
- "Should I page for this alert?" becomes an alert severity classification.
Automating decisions removes inconsistency, reduces cognitive load, and frees engineers to focus on decisions that genuinely require human judgment.
Interfaces Over Implementations
When designing component boundaries, define the interface first. What does the consumer need? What does the provider guarantee? The implementation should satisfy the interface, not the other way around.
This applies at every scale:
- Function signatures define the contract between caller and callee
- API specifications define the contract between client and server
- Message schemas define the contract between producer and consumer
- SLAs define the contract between teams
When the interface is well-defined, the implementation can change without affecting consumers. When the implementation drives the interface, every internal change risks breaking consumers.
Key Takeaways
- Optimize the common case for speed and the edge case for correctness. Do not conflate the two.
- Fail loudly so problems are visible. Recover quietly through automation.
- Never optimize without measurement. Intuition about bottlenecks is unreliable.
- Design for the median engineer on the team, not the strongest one.
- Prefer explicit code over implicit behavior. Explicit is searchable, debuggable, and reviewable.
- Separate policy from mechanism so each can change independently.
- Testability is a design property, not an afterthought. Hard-to-test code is hard-to-maintain code.
- Automate repetitive decisions to reduce inconsistency and cognitive load.
- Define interfaces before implementations. The consumer's needs drive the contract.
Further Reading
- Engineering Decisions I Don't Compromise On: Non-negotiable engineering principles I enforce regardless of deadline pressure, team size, or scope, and the reasoning behind each one.
- What Good Engineering Looks Like in Practice: Concrete characteristics that distinguish good engineering from impressive engineering, based on observations across teams and systems ov...
- How I Think About Engineering Risk: A framework for identifying, categorizing, and managing engineering risk across system design, team dynamics, and operational decisions.
Final Thoughts
These principles are not original. Most have been articulated by other engineers in other contexts. Their value is not in novelty but in consistent application. The teams I have seen succeed are not the ones using the newest technology or the cleverest patterns. They are the ones applying well-understood principles consistently, across every component, every decision, and every review.
Recommended
Designing an Offline-First Sync Engine for Mobile Apps
A deep dive into building a reliable sync engine that keeps mobile apps functional without connectivity, covering conflict resolution, queue management, and real-world trade-offs.
Jetpack Compose Recomposition: A Deep Dive
A detailed look at how Compose recomposition works under the hood, what triggers it, how the slot table tracks state, and how to control it in production apps.
Event Tracking System Design for Android Applications
A systems-level breakdown of designing an event tracking system for Android, covering batching, schema enforcement, local persistence, and delivery guarantees.