Caching & Redis

Upstash Redis for API rate limiting and caching — category-based limits, AI quotas, and fail-safe design

Kit ships with Upstash Redis as the caching and rate limiting backbone. The system provides category-based API rate limiting (upload, email, contact, payments, webhooks), tier-based AI quotas (free through enterprise), and a fail-safe design that degrades gracefully when Redis is unavailable.
This page covers setup, the rate limiting architecture, AI quotas, and debugging tools. For the security overview, see Security. For AI-specific cost management, see Cost Management.

How It Works

Rate limiting uses a two-layer architecture — an ephemeral in-memory cache reduces Redis calls by 50-80%, and Redis provides the persistent sliding window counters:
API Request
    |
    v
Rate Limit Middleware (withRateLimit)
    |--- Extracts userId (Clerk) and IP (headers)
    |
    v
Ephemeral Cache (in-memory Map)
    |--- Hit? → Return cached result (no Redis call)
    |--- Miss? → Continue to Redis
    |
    v
Upstash Redis (sliding window algorithm)
    |--- Checks user-based limit (if userId available)
    |--- Checks IP-based limit (if IP available)
    |--- Returns: { success, limit, remaining, reset }
    |
    v
Response
    |--- 200 OK + X-RateLimit-* headers (if allowed)
    |--- 429 Too Many Requests (if rate limited)
The ephemeral cache is a module-scope Map that persists across requests within the same serverless instance. This is safe because rate limiting is inherently approximate — a few extra requests during cache revalidation are acceptable.

Setup

1

Create an Upstash Redis database

Sign up at upstash.com and create a new Redis database. Select the region closest to your deployment. The free tier includes 500K commands/month — sufficient for most SaaS applications.
2

Set environment variables

Copy the REST URL and token from your Upstash dashboard and add them to apps/boilerplate/.env.local:
bash
UPSTASH_REDIS_REST_URL=https://your-database.upstash.io
UPSTASH_REDIS_REST_TOKEN=AXxxxxxxxxxxxxxxxxxxxxxxxxxxxx

API Rate Limiting

The API rate limiter uses a category-based system where each API category has independent limits for user-based and IP-based tracking:
src/lib/security/api-rate-limiter.ts — API_LIMITS Configuration
const API_LIMITS: Record<
  APICategory,
  { user?: RateLimitConfig; ip?: RateLimitConfig }
> = {
  upload: {
    user: { requests: 10, window: '1 h', identifier: 'user' },
    ip: { requests: 20, window: '1 h', identifier: 'ip' },
  },
  email: {
    user: { requests: 5, window: '1 h', identifier: 'user' },
    ip: { requests: 10, window: '1 h', identifier: 'ip' },
  },
  contact: {
    ip: { requests: 3, window: '1 h', identifier: 'ip' },
  },
  payments: {
    user: { requests: 20, window: '1 h', identifier: 'user' },
  },
  webhooks: {
    ip: { requests: 100, window: '1 h', identifier: 'ip' },
  },
  api: {
    user: { requests: 100, window: '1 h', identifier: 'user' },
    ip: { requests: 200, window: '1 h', identifier: 'ip' },
  },
}

Rate Limit Categories

CategoryUser LimitIP LimitUse Case
upload10 req/hour20 req/hourFile upload endpoint
email5 req/hour10 req/hourEmail sending endpoints
contact3 req/hourContact form (IP-only, no auth required)
payments20 req/hourPayment and checkout endpoints
webhooks100 req/hourWebhook processing (high throughput)
api100 req/hour200 req/hourGeneral API endpoints (catch-all)
Each category can have a user-based limit, IP-based limit, or both. When both are configured, the request must pass both checks.

Using the Rate Limit Middleware

The withRateLimit middleware factory wraps API route handlers with automatic rate limiting:
src/lib/security/rate-limit-middleware.ts — withRateLimit Signature
export function withRateLimit(
  category: APICategory,
  handler: (request: NextRequest) => Promise<NextResponse>
): (request: NextRequest) => Promise<NextResponse> {

Usage in API Routes

typescript
import { withRateLimit } from '@/lib/security/rate-limit-middleware'

// Wrap your handler with a rate limit category
export const POST = withRateLimit('upload', async (request) => {
  // Your handler logic — only runs if rate limit passes
  return NextResponse.json({ success: true })
})
The middleware automatically:
  1. Extracts the user ID from Clerk authentication
  2. Extracts the client IP from x-forwarded-for or x-real-ip headers
  3. Checks both user and IP rate limits for the category
  4. Returns 429 Too Many Requests with retry information if exceeded
  5. Adds X-RateLimit-* headers to all responses

AI Rate Limiting

The AI system has its own dedicated rate limiter with tier-based monthly quotas and global burst protection:

Tier-Based Monthly Quotas

TierMonthly RequestsWindowOverride Variable
Free2030 daysAI_FREE_TIER_REQUESTS
Basic10030 daysAI_BASIC_TIER_REQUESTS
Pro1,00030 daysAI_PRO_TIER_REQUESTS
Enterprise10,00030 daysAI_ENTERPRISE_TIER_REQUESTS

Three-Layer Check

Every AI request goes through three sequential checks:
  1. Global burst protection — 10 requests per 10 seconds (configurable via AI_RATE_LIMIT_MAX_REQUESTS and AI_RATE_LIMIT_WINDOW). This prevents any single user from flooding the AI endpoints.
  2. Credit balance check — If the credit system is enabled (NEXT_PUBLIC_PRICING_MODE=credits), the system checks whether the user has sufficient credits for the operation.
  3. Auto-reset — If 30+ days have elapsed since the last credit reset, credits are automatically reset as a backup for missed webhook events.

Response Headers

Every rate-limited response includes standard headers so clients can track their usage:
HeaderValueDescription
X-RateLimit-Limit100Maximum requests allowed in the window
X-RateLimit-Remaining87Requests remaining in current window
X-RateLimit-Reset1708012800000Unix timestamp (ms) when the window resets
When the limit is exceeded, the 429 response body includes:
json
{
  "error": "Rate limit exceeded",
  "message": "Too many requests. Please try again in 45 minutes.",
  "retryAfter": "45 minutes",
  "limit": 10,
  "remaining": 0,
  "reset": 1708012800000
}

Server-Side Caching Patterns

Beyond rate limiting, Kit uses module-scope caching for configuration data that rarely changes but is read frequently:
typescript
// Example: Caching pricing configuration
// Module-scope variable persists across requests in the same serverless instance
let cachedPricingConfig: PricingConfig | null = null
let lastFetched = 0
const CACHE_TTL = 5 * 60 * 1000 // 5 minutes

export async function getPricingConfig(): Promise<PricingConfig> {
  const now = Date.now()
  if (cachedPricingConfig && now - lastFetched < CACHE_TTL) {
    return cachedPricingConfig
  }

  cachedPricingConfig = await fetchPricingFromDB()
  lastFetched = now
  return cachedPricingConfig
}
This pattern is used for payment plan configuration, credit cost tables, and other data that changes infrequently. It avoids database queries on every request without requiring Redis.

Fail-Safe Design

The rate limiting system is designed to fail open — if Redis is unavailable, requests are allowed through with a warning. This prevents Redis outages from blocking legitimate user traffic:
Redis Available?
    |
    ├── Yes → Normal rate limiting
    |         (check limits, return success/failure)
    |
    └── No  → Fail-open mode
              |--- Log warning: "Rate limiting disabled - Redis not available"
              |--- Return success with max limits
              |--- Request proceeds normally
This pattern is applied consistently across both the API rate limiter and the AI rate limiter. Individual Redis call failures are also caught and handled with fail-open behavior.

Debugging Tools

Kit includes two scripts for inspecting and resetting Redis rate limit data:
ScriptCommandPurpose
scripts/inspect-redis-keys.tsnpx tsx scripts/inspect-redis-keys.tsList all rate limit keys in Redis with their values and TTLs
scripts/reset-rate-limits.tsnpx tsx scripts/reset-rate-limits.tsReset all rate limit counters (useful during development)
Both scripts connect to your Upstash Redis instance using the same environment variables.

Redis Key Patterns

Rate limit keys follow a structured prefix pattern for easy identification:
PrefixSystemExample Key
@upstash/ratelimit/api/{category}/{type}API rate limiter@upstash/ratelimit/api/upload/user:user:abc123
@upstash/ratelimit/ai/globalAI global burst@upstash/ratelimit/ai/global:user:abc123
@upstash/ratelimit/ai/{tier}AI tier limit@upstash/ratelimit/ai/pro:user:abc123
All keys are managed by the @upstash/ratelimit library and include automatic expiration based on the sliding window duration.

Environment Variables

VariableRequiredDefaultPurpose
UPSTASH_REDIS_REST_URLNoUpstash Redis REST API URL
UPSTASH_REDIS_REST_TOKENNoUpstash Redis REST API token
AI_RATE_LIMIT_WINDOWNo10Global burst window in seconds
AI_RATE_LIMIT_MAX_REQUESTSNo10Global burst max requests per window
AI_FREE_TIER_REQUESTSNo20Free tier monthly AI request limit
AI_BASIC_TIER_REQUESTSNo100Basic tier monthly AI request limit
AI_PRO_TIER_REQUESTSNo1000Pro tier monthly AI request limit
AI_ENTERPRISE_TIER_REQUESTSNo10000Enterprise tier monthly AI request limit

Key Files

FilePurpose
apps/boilerplate/src/lib/security/api-rate-limiter.tsCategory-based API rate limiting (upload, email, contact, etc.)
apps/boilerplate/src/lib/security/rate-limit-middleware.tswithRateLimit() middleware factory for API routes
apps/boilerplate/src/lib/ai/rate-limiter.tsAI-specific rate limiting (tier-based, global burst)
apps/boilerplate/scripts/inspect-redis-keys.tsDebug script to inspect Redis keys and TTLs
apps/boilerplate/scripts/reset-rate-limits.tsDebug script to reset all rate limit counters