Designing a Simple Authentication Service

Dhruval Dhameliya·July 2, 2025·8 min read

Architecture for a session-based authentication service with JWT access tokens, refresh token rotation, and measured security trade-offs.

Context

A web application needed authentication for 50,000 users. Requirements: email/password login, session management, JWT access tokens for API authorization, refresh token rotation, and account lockout after failed attempts. No third-party auth provider (Auth0, Clerk) due to cost constraints at scale.

Problem

Authentication is a solved problem with unsolved trade-offs. JWTs are stateless but irrevocable. Sessions are revocable but require server-side state. Refresh tokens extend session duration but add complexity. The design must balance security, performance, and operational simplicity.

Constraints

  • Users: 50,000 registered, 15,000 DAU
  • Login frequency: average 1.2 logins per user per day (across devices)
  • Token verification: every API request (18,000 req/min at peak)
  • Storage: Postgres for user accounts, Redis for active sessions
  • Access token lifetime: 15 minutes
  • Refresh token lifetime: 7 days
  • Must support multiple concurrent sessions (mobile + web)
  • Password storage: bcrypt with cost factor 12

Design

Schema

CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email TEXT UNIQUE NOT NULL,
  password_hash TEXT NOT NULL,
  email_verified BOOLEAN NOT NULL DEFAULT false,
  failed_login_attempts INTEGER NOT NULL DEFAULT 0,
  locked_until TIMESTAMPTZ,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
 
CREATE TABLE refresh_tokens (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES users(id),
  token_hash TEXT NOT NULL,
  family_id UUID NOT NULL, -- for rotation detection
  expires_at TIMESTAMPTZ NOT NULL,
  revoked_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
 
CREATE INDEX idx_refresh_tokens_user ON refresh_tokens (user_id);
CREATE INDEX idx_refresh_tokens_family ON refresh_tokens (family_id);

Token Architecture

Login
  -> Verify credentials
  -> Generate access token (JWT, 15 min, signed with RS256)
  -> Generate refresh token (opaque, 7 days, stored in DB)
  -> Return both to client

API Request
  -> Verify access token (JWT signature check, no DB lookup)
  -> If expired, client uses refresh token to get new pair

Token Refresh
  -> Validate refresh token against DB
  -> Rotate: issue new refresh token, revoke old one
  -> Issue new access token
  -> Return both to client

Access Token (JWT)

const accessToken = jwt.sign(
  {
    sub: user.id,
    email: user.email,
    roles: user.roles,
    iat: Math.floor(Date.now() / 1000),
    exp: Math.floor(Date.now() / 1000) + 900, // 15 minutes
  },
  privateKey,
  { algorithm: 'RS256' }
);

RS256 (RSA + SHA-256) allows verification with the public key only. API servers do not need access to the signing key.

Refresh Token Rotation

async function rotateRefreshToken(oldTokenValue: string) {
  const oldTokenHash = sha256(oldTokenValue);
  const oldToken = await db.query(
    'SELECT * FROM refresh_tokens WHERE token_hash = $1 AND revoked_at IS NULL',
    [oldTokenHash]
  );
 
  if (!oldToken) {
    // Token not found or already revoked
    // Possible token reuse attack: revoke entire family
    await db.query(
      'UPDATE refresh_tokens SET revoked_at = now() WHERE family_id = $1',
      [oldToken?.family_id]
    );
    throw new Error('Invalid refresh token');
  }
 
  if (oldToken.expires_at < new Date()) {
    throw new Error('Refresh token expired');
  }
 
  // Revoke old token
  await db.query(
    'UPDATE refresh_tokens SET revoked_at = now() WHERE id = $1',
    [oldToken.id]
  );
 
  // Issue new refresh token in the same family
  const newTokenValue = crypto.randomBytes(32).toString('hex');
  await db.query(
    `INSERT INTO refresh_tokens (user_id, token_hash, family_id, expires_at)
     VALUES ($1, $2, $3, $4)`,
    [oldToken.user_id, sha256(newTokenValue), oldToken.family_id, addDays(new Date(), 7)]
  );
 
  return newTokenValue;
}

The family_id groups all refresh tokens from a single login session. If a revoked token is reused (indicating theft), the entire family is revoked, forcing re-authentication on all devices in that session family.

Account Lockout

async function handleFailedLogin(userId: string) {
  const result = await db.query(
    `UPDATE users
     SET failed_login_attempts = failed_login_attempts + 1,
         locked_until = CASE
           WHEN failed_login_attempts >= 4 THEN now() + interval '15 minutes'
           ELSE locked_until
         END
     WHERE id = $1
     RETURNING failed_login_attempts`,
    [userId]
  );
  return result.rows[0].failed_login_attempts;
}

After 5 failed attempts, the account locks for 15 minutes. Failed attempt count resets on successful login.

Trade-offs

Token Strategy Comparison

PropertyJWT Only (no refresh)JWT + Refresh (this design)Session Cookie Only
Stateless verificationYesAccess: yes, Refresh: noNo
RevocabilityNo (until expiry)Access: no, Refresh: yesYes (immediate)
DB lookups per API request001
Token theft impactFull access until expiry15 min max (access), detectable (refresh)Until session invalidated
Multi-device supportManualNative (separate refresh tokens)Native (separate sessions)
ComplexityLowMediumLow

Performance Measurements

OperationLatency (p50)Latency (p95)
JWT verification (RS256)0.3ms0.5ms
Login (bcrypt + DB + token gen)280ms420ms
Token refresh (DB lookup + token gen)8ms22ms
Session check (Redis)0.5ms1.2ms

JWT verification at 0.3ms per request adds 5.4 seconds of cumulative CPU time per minute at 18,000 req/min. This is negligible.

bcrypt at cost factor 12 takes ~250ms. This is intentionally slow to resist brute-force attacks. At 18,000 logins/hour (peak), this requires 4,500 CPU-seconds/hour of bcrypt computation. A single core handles ~14 bcrypt operations/second, so 4 concurrent login requests saturate one core.

Security Properties

Attack VectorMitigation
Password brute forcebcrypt (250ms/attempt) + account lockout (5 attempts)
Token theft (access)15-minute expiry limits exposure window
Token theft (refresh)Rotation detection via family_id, entire family revoked
Token replayShort-lived access tokens, single-use refresh tokens
Credential stuffingRate limiting on login endpoint (10 attempts/IP/minute)
Password database leakbcrypt with cost factor 12 (estimated crack time: years per hash)

Failure Modes

Related: Failure Modes I Actively Design For.

Redis down for session storage: If Redis is unavailable, active session checks fail. JWT access tokens continue to work (stateless verification), but refresh token operations fail because they query Postgres through the session layer. Mitigation: separate the refresh token flow from session storage. Use Postgres directly for refresh token operations.

Clock skew on JWT verification: If the API server clock is ahead of the auth server clock, newly issued JWTs may appear to be "not yet valid." Mitigation: add a 30-second clock skew tolerance to the JWT verification library.

Refresh token family false positive: If a client retries a refresh request (due to network timeout) and the first request succeeded, the retry uses a revoked token. This triggers the token reuse detection, revoking the entire family and forcing re-authentication. Mitigation: add a 10-second grace period for recently revoked tokens (allow a single reuse within the grace window).

bcrypt DoS: An attacker sending thousands of login requests with random passwords forces the server to compute bcrypt hashes for each one, consuming CPU. Mitigation: rate limit the login endpoint aggressively (10 req/IP/minute) and add a proof-of-work challenge (e.g., hashcash) for IPs exceeding the limit.

Scaling Considerations

  • JWT verification scales linearly with CPU. No database dependency for API authorization.
  • Login endpoint is CPU-bound (bcrypt). Scale horizontally with more instances, or use a dedicated login service.
  • Refresh token rotation requires a database write per refresh. At 15-minute access token lifetimes and 15,000 DAU, that is ~60,000 refresh operations/day. Postgres handles this easily.
  • For millions of users, partition the refresh_tokens table by user_id and set up automatic cleanup of expired tokens.

Observability

  • Track login success/failure rate per IP and per account
  • Monitor JWT verification errors (expired, invalid signature, malformed)
  • Alert on refresh token family revocations (potential token theft)
  • Dashboard: active sessions per user, login frequency, lockout events
  • Log (but do not expose) the reason for every authentication failure

See also: Event Tracking System Design for Android Applications.

Key Takeaways

  • JWT access tokens (15 minutes) with refresh token rotation provides the best balance of stateless verification and revocability.
  • Refresh token family tracking detects token theft by identifying reuse of revoked tokens.
  • bcrypt at cost factor 12 is slow by design. Plan CPU capacity for the login endpoint accordingly.
  • Account lockout is a blunt instrument. Combine it with IP-based rate limiting for defense in depth.
  • Add a grace period for refresh token reuse to handle client retry scenarios without false-positive family revocations.

Further Reading

Final Thoughts

This authentication service handles 15,000 DAU with zero additional infrastructure cost beyond existing Postgres and Redis. The JWT + refresh token pattern eliminates database lookups on every API request (saving 18,000 queries/minute) while maintaining the ability to revoke sessions within 15 minutes. The total implementation is approximately 500 lines of TypeScript. The primary ongoing operational task is monitoring for credential stuffing attacks via the failed login rate dashboard.

Recommended