Designing a Simple CMS From Scratch
Architecture decisions behind building a file-based CMS with MDX, Git-backed versioning, and incremental builds for a content-heavy site.
Context
I needed a CMS for a technical blog. Requirements: MDX support, Git-based version control, fast builds, structured frontmatter validation, and zero runtime dependencies. No database. No admin panel. Just files, a schema, and a build pipeline.
Problem
Existing headless CMS options (Contentful, Sanity, Strapi) add network dependencies, API rate limits, and vendor lock-in. WordPress is operationally expensive for a static site. A file-based approach eliminates these issues but requires building content ingestion, validation, and rendering from scratch.
See also: Building a Simple Search Index.
Constraints
- Content format: MDX (Markdown with JSX components)
- Storage: Git repository (GitHub)
- Build tool: Next.js with static export
- Frontmatter schema: must be validated at build time, not runtime
- Build time budget: under 60 seconds for 500 posts
- Authors: 1-3 contributors, all comfortable with Git
- No runtime API calls for content
Design
Content Structure
src/content/
blog/
post-slug.mdx
projects/
project-slug.mdx
config/
navigation.json
site.json
Each MDX file contains frontmatter and content:
---
title: "Post Title"
description: "A short description"
date: "2025-10-18"
tags: ["architecture"]
draft: false
---
Content here with <CustomComponent /> support.Schema Validation
Zod schemas validate frontmatter at build time:
const blogSchema = z.object({
title: z.string().min(1).max(120),
description: z.string().min(1).max(300),
date: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
tags: z.array(z.string()).min(1).max(5),
draft: z.boolean().default(false),
});A build script runs validation before the Next.js build. Any schema violation fails the build with a clear error message pointing to the offending file and field.
Content Pipeline
| Stage | Tool | Output |
|---|---|---|
| Parse frontmatter | gray-matter | Structured metadata |
| Validate schema | Zod | Pass/fail with errors |
| Compile MDX | next-mdx-remote | Serialized React tree |
| Generate slugs | File path convention | URL structure |
| Build index | Custom script | JSON feed for search, RSS |
Component Resolution
Custom MDX components are registered in a central map:
const components = {
Callout: CalloutComponent,
CodeBlock: CodeBlockComponent,
Table: TableComponent,
Image: OptimizedImage,
};This keeps content portable. If a component is removed, the build fails with a clear reference error rather than silently rendering broken HTML.
Trade-offs
| Aspect | File-Based CMS | Headless CMS (API) | Database CMS |
|---|---|---|---|
| Content editing | Code editor + Git | Web UI | Web UI |
| Version control | Git (native) | Vendor-specific | Manual/plugins |
| Build dependency | None (local files) | Network + API | Network + DB |
| Schema enforcement | Build-time | Runtime/webhook | Runtime |
| Preview | Local dev server | Preview API | Admin panel |
| Contributor friction | High (Git knowledge) | Low | Low |
| Vendor lock-in | None | High | Medium |
| Cost at scale | $0 | $50-500/month | $20-100/month |
The primary trade-off is contributor friction. Non-technical writers struggle with Git workflows. For a team of engineers writing technical content, this is not a constraint.
Build Performance
| Post Count | Full Build | Incremental (1 post changed) |
|---|---|---|
| 100 | 12s | 4s |
| 500 | 48s | 5s |
| 1,000 | 95s | 6s |
| 5,000 | 380s | 8s |
Incremental builds leverage Next.js ISR and file-system watching to rebuild only changed content. The build script hashes each file and compares against a cached manifest.
Related: Failure Modes I Actively Design For.
Failure Modes
Broken MDX syntax: An unclosed JSX tag in an MDX file crashes the entire build. Mitigation: a pre-commit hook that compiles each changed MDX file in isolation. Failures block the commit.
Frontmatter drift: Over time, authors add non-schema fields or use incorrect types. The Zod validation catches this, but only at build time. If CI is slow, authors get feedback minutes after pushing. Mitigation: run validation as a pre-commit hook (sub-second for changed files).
Image reference rot: MDX files reference images by path. If an image is moved or deleted, the build does not fail (images are resolved at runtime by the browser). Mitigation: a build script that checks all image references against the filesystem.
Git merge conflicts in MDX: Two authors editing the same file creates merge conflicts in content. MDX conflicts are harder to resolve than code conflicts because the diff context is prose. Mitigation: one file per post, small atomic commits, and a convention that each author works on separate posts.
Scaling Considerations
- At 5,000+ posts, full builds exceed the 60-second budget. The solution is to never run full builds. Incremental builds handle single-post changes in under 10 seconds regardless of corpus size.
- For multi-author workflows, consider a Git-based review process (pull requests for content) with automated preview deployments per PR.
- Content search requires a pre-built index. At 5,000 posts, the search index is ~2MB. This should be server-side (see search post for details).
- RSS and sitemap generation should be part of the build pipeline, not a runtime concern.
Observability
- Build time per stage (parse, validate, compile, generate) logged to stdout and captured in CI
- Schema validation errors surfaced as GitHub Actions annotations on the PR
- Content statistics (word count, reading time, tag distribution) generated at build time and written to a JSON manifest
- Broken link detection as a post-build step using a crawler against the static output
Key Takeaways
- File-based CMS eliminates runtime dependencies entirely. Content is an artifact of the build, not a service.
- Schema validation at build time catches errors earlier and more reliably than runtime validation.
- Incremental builds are essential for scaling. Full rebuilds are only for CI verification.
- The contributor experience is the main constraint. This approach works for technical teams, not content marketing departments.
- Pre-commit hooks are the most effective quality gate. Build-time validation is the second line of defense.
Further Reading
- Designing a Simple Authentication Service: Architecture for a session-based authentication service with JWT access tokens, refresh token rotation, and measured security trade-offs.
- Designing a Simple Metrics Collection Service: Architecture for a lightweight metrics ingestion pipeline using a buffer, batch writes, and pre-aggregated rollups on Postgres.
- Designing a Feature Flag and Remote Config System: Architecture and trade-offs for building a feature flag and remote configuration system that handles targeting, rollout, and consistency ...
Final Thoughts
This CMS has served 500+ posts over 18 months with zero runtime incidents. The entire content pipeline runs in CI, produces static HTML, and deploys to a CDN. There is no server to monitor, no database to back up, and no API to rate-limit. The trade-off is contributor experience, which is acceptable when all contributors are engineers who already live in Git.
Recommended
Designing an Offline-First Sync Engine for Mobile Apps
A deep dive into building a reliable sync engine that keeps mobile apps functional without connectivity, covering conflict resolution, queue management, and real-world trade-offs.
Jetpack Compose Recomposition: A Deep Dive
A detailed look at how Compose recomposition works under the hood, what triggers it, how the slot table tracks state, and how to control it in production apps.
Event Tracking System Design for Android Applications
A systems-level breakdown of designing an event tracking system for Android, covering batching, schema enforcement, local persistence, and delivery guarantees.