Skip to content
scsiwyg
sign insign up
get startedmcpcommunityapiplaygroundswaggersign insign up
โ† Stone Maps

P0 Hardening: Rate Limiting, Validation, and the Road to Production

#stonemaps#devlog#build#nextjs

P0 Hardening: Rate Limiting, Validation, and the Road to Production

There's a point in building a product where "it works on my machine" stops being acceptable. The early-access cohort for Stone Maps is small โ€” 50 people โ€” but they're real users, and the app needed to behave like a real app. That meant a hardening sprint before we opened the doors.

Here's what we built, and why each piece earned its place.

Rate Limiting with Upstash Redis

Every API route in Stone Maps is rate-limited. Not "most routes." Every route.

The implementation is a withRateLimit() wrapper that takes an identifier (user ID or IP address) and max/window parameters, calls Upstash Redis to track the count, and either runs the handler or returns a 429:

export async function withRateLimit(
  options: RateLimitOptions,
  handler: () => Promise<NextResponse>
): Promise<NextResponse> {
  const { allowed, remaining, resetIn } = await rateLimit(options.identifier, {
    max: options.max,
    window: options.window,
  });

  if (!allowed) {
    return NextResponse.json(
      { error: 'Too many requests', message: `Try again in ${resetIn} seconds.` },
      {
        status: 429,
        headers: {
          'X-RateLimit-Limit': options.max.toString(),
          'X-RateLimit-Remaining': '0',
          'X-RateLimit-Reset': resetIn.toString(),
          'Retry-After': resetIn.toString(),
        },
      }
    );
  }

  const response = await handler();
  response.headers.set('X-RateLimit-Limit', options.max.toString());
  response.headers.set('X-RateLimit-Remaining', remaining.toString());
  response.headers.set('X-RateLimit-Reset', resetIn.toString());
  return response;
}

The identifier strategy is: use user ID when available (authenticated requests), fall back to IP address (unauthenticated). IP-based limiting applies to login, registration, and any endpoint reachable without a session. User-based limiting applies to everything else.

The rate limit headers are set on every response, not just rejections. This lets clients implement backoff intelligently โ€” they know how many requests remain before they'll be throttled.

Upstash Redis is serverless-native: it's an HTTP API, not a persistent TCP connection. This matters in Vercel's environment where each request might hit a different function instance. Redis state is shared across all instances; in-memory state is not.

Zod Validation on Every Input

Before P0, request bodies were parsed with await req.json() and typed with as SomeType. TypeScript satisfied, but no runtime guarantee. A malformed request could do whatever the downstream code allowed.

The validation layer is validateBody() and validateParams():

export async function validateBody<T>(
  req: Request,
  schema: ZodSchema<T>
): Promise<{ data: T } | { error: NextResponse }> {
  let body: unknown;
  try {
    body = await req.json();
  } catch {
    return { error: NextResponse.json({ error: 'Invalid JSON' }, { status: 400 }) };
  }

  const result = schema.safeParse(body);
  if (!result.success) {
    return {
      error: NextResponse.json(
        { error: 'Validation failed', details: formatZodError(result.error) },
        { status: 400 }
      ),
    };
  }
  return { data: result.data };
}

The discriminated union return type โ€” { data: T } | { error: NextResponse } โ€” makes it type-safe to use:

typescript
const parsed = await validateBody(req, createPostSchema);
if ('error' in parsed) return parsed.error;
const { data } = parsed; // TypeScript knows this is the validated type

The Zod schemas live in lib/validations.ts. There's one per API surface: createPostSchema, updateUserSchema, createTeamSchema, etc. They're used in tests and in routes, which means a test failure due to validation change is a real signal โ€” not a mock diverging from production behavior.

Error Monitoring

The monitoring layer is captureException() โ€” a thin wrapper around Sentry that also logs to console in development. Every API route wraps its try/catch with it:

typescript
captureException(error instanceof Error ? error : new Error(String(error)), {
  route: 'POST /api/posts',
});

The context parameter (route name, user ID, etc.) makes Sentry grouping useful. Without it, every uncaught error looks like a generic Error: Internal server error.

In production, Sentry captures unhandled errors with full stack traces and context. In development, it's a console logger. The same call site works in both environments โ€” no if (process.env.NODE_ENV === 'production') guards scattered through the code.

PostHog analytics is configured but disabled by default. The commented-out env vars in .env make it easy to enable when we want event tracking without it being on by default.

CI/CD Pipeline

The GitHub Actions workflow runs on every push:

  1. Type check โ€” npm run type-check across all workspaces. This is what caught the dead-code comparison in the admin users page before it caused a mysterious Vercel deployment failure later.
  2. Lint โ€” ESLint with the Next.js ruleset. Catches unescaped entities, missing keys in JSX lists, and a few patterns that TypeScript doesn't catch.
  3. Test โ€” Vitest across 260 tests. The test suite covers rate limiting, RBAC, validation, API routes (with mocked DB), and the emissary prompt logic.
  4. Build โ€” npm run build. If the app can't compile, nothing ships.

Vercel deploys on successful push to main. There are no staging environments yet โ€” the model is "tests pass, merge, deploy." For 50 users, this is fine. For a wider launch, we'd want a staging environment that mirrors production state.

The Test Suite

260 tests across 15 files. The coverage areas that matter most:

  • RBAC โ€” every permission combination tested. A moderator can't manage users. An admin can't manage other admins. A viewer can read but not write.
  • Rate limiting โ€” the Redis wrapper tested with mock counters. Window expiry, remaining-count header, identifier routing (user vs. IP).
  • Validation โ€” every Zod schema tested with valid and invalid inputs. Type coercion, missing fields, oversized strings.
  • Emissary prompts โ€” all five trigger types with boundary conditions. 7 days exactly vs. 6 days 23 hours. 3 posts at the same hour vs. 3 posts at different hours. Location cluster of exactly 3 vs. 2.
  • Media validation โ€” 60 tests for file type, size, and MIME type checks. The most tests by file, because media validation has the most attack surface.

The test infrastructure splits into two Vitest configs: a node environment config for server-side logic, and a jsdom config for components that need DOM access (the settings store tests need localStorage). The distinction is explicit in the config files rather than managed by per-file env comments โ€” except for one file that uses // @vitest-environment jsdom to override, which is fine.

What Hardening Actually Means

The P0 sprint wasn't glamorous. It was adding withRateLimit calls to 25 API routes, writing Zod schemas for inputs that already worked, and wiring up error monitoring that mostly doesn't trigger.

But it changes the character of the codebase. When a new API route gets added, the pattern is obvious: auth check, rate limit wrapper, validate body, do the thing. When something breaks in production, Sentry has the stack trace. When a build fails in CI, TypeScript told you why.

The work isn't visible to users. That's the point.