Cost Management

Kit protects your AI budget with a three-layer cost management system: global rate limiting for burst protection, a credit-based system for per-operation billing, and usage tracking for analytics and cost monitoring. All three layers work together — every AI request must pass through each layer before reaching the provider.

This page covers the rate limiting architecture, credit costs, usage tracking, token cost calculation, and feature flag integration. For the credit system's billing integration, see Credit System.

Three-Layer Architecture

Every AI request passes through three protection layers in sequence:

Incoming AI Request
    |
    v
Layer 1: Global Rate Limit (Burst Protection)
    |--- Upstash Redis sliding window
    |--- 10 requests per 10 seconds (configurable)
    |--- Applies to ALL users regardless of tier
    |--- Purpose: Prevent DDoS and burst abuse
    |--- Fail: 429 "Too many requests. Please slow down."
    |
    v
Layer 2: Credit System (Per-Operation Billing)
    |--- Check credit balance in database
    |--- Verify sufficient credits for operation
    |--- Auto-reset if 30+ days elapsed (webhook backup)
    |--- Atomic deduction BEFORE processing
    |--- Fail: 402 "Insufficient credits"
    |
    v
Layer 3: Usage Tracking (Analytics)
    |--- Non-blocking database write (after response)
    |--- Records: provider, model, tokens, cost, purpose
    |--- Monthly aggregation for quota checks
    |--- Used for billing analytics and dashboards
    |
    v
Request reaches AI Provider

Rate limiting and credit checks use a fail-open strategy. If Redis is unavailable or a database query fails, the request is allowed with a warning logged. This prevents infrastructure issues from blocking all users. Monitor your logs for ⚠️ Rate limiting disabled warnings.

Rate Limiting

Global Burst Protection

The global rate limiter uses Upstash Redis with a sliding window algorithm. It applies to all users and prevents burst abuse:

bash

# Configure via environment variables
AI_RATE_LIMIT_WINDOW=10      # Window in seconds (default: 10)
AI_RATE_LIMIT_MAX_REQUESTS=10 # Max requests per window (default: 10)

Key features:

Sliding window — smoother than fixed windows, no burst at window boundaries
Ephemeral cache — in-memory cache reduces Redis calls by 50-80%
Analytics — Upstash analytics enabled for monitoring

Tier-Based Monthly Quotas

Each subscription tier has a monthly request limit, enforced via separate Redis rate limiters:

Tier	Monthly Limit	Window	Env Override
Free	500	30 days	`AI_FREE_TIER_REQUESTS`
Basic	1,500	30 days	`AI_BASIC_TIER_REQUESTS`
Pro	5,000	30 days	`AI_PRO_TIER_REQUESTS`
Enterprise	15,000	30 days	`AI_ENTERPRISE_TIER_REQUESTS`

The user's tier is determined from their Lemon Squeezy subscription variant ID. Users without a subscription default to the Free tier.

Rate Limit Check Flow

The comprehensive checkRateLimit function orchestrates all three checks. It is called by every AI API route:

src/lib/ai/rate-limiter.ts — Comprehensive Rate Limit Check

export async function checkRateLimit(params: {
  userId?: string
  sessionId?: string
  ip?: string
  cost?: number // Credit cost for operation (default: 1)
}): Promise<{
  success: boolean
  limit: number
  remaining: number
  reset: number
  tier: SubscriptionTier
  reason?: string
  creditSystemEnabled: boolean
}> {
  const { userId, sessionId, ip, cost = 1 } = params

  // Get identifier for burst protection
  const identifier = getIdentifier(userId, sessionId, ip)

  // STEP 1: ALWAYS check global rate limit (DDoS/Burst protection)
  const globalResult = await checkGlobalRateLimit(identifier)
  if (!globalResult.success) {
    return {
      success: false,
      limit: globalResult.limit,
      remaining: globalResult.remaining,
      reset: globalResult.reset,
      tier: 'free',
      reason: 'Too many requests. Please slow down.',
      creditSystemEnabled: isCreditSystemEnabled(),
    }
  }

  // STEP 2: Check if credit system is enabled
  if (!isCreditSystemEnabled()) {
    console.log('[Rate Limit] Credit system disabled - allowing request')
    return {
      success: true,
      limit: 999999, // Unlimited
      remaining: 999999,
      reset: Date.now() + 30 * 24 * 60 * 60 * 1000,
      tier: userId ? await getUserTier(userId) : 'free',
      creditSystemEnabled: false,
    }
  }

The function returns a standardized result:

typescript

{
  success: boolean          // true if all checks passed
  limit: number             // Total credit/request limit
  remaining: number         // Remaining credits/requests
  reset: number             // Unix timestamp for reset
  tier: SubscriptionTier    // User's subscription tier
  reason?: string           // Human-readable error message
  creditSystemEnabled: boolean  // Whether credit system is active
}

Every AI API route MUST call checkRateLimit() before processing. Skipping this check means no burst protection, no credit validation, and no auto-reset — leaving your AI budget unprotected.

Credit Costs

Every AI operation has a defined credit cost. Costs are based on estimated token usage and computational complexity:

src/lib/credits/credit-costs.ts — Operation Costs

export const CREDIT_COSTS = {
  // ============================================================================
  // FAQ Operations
  // ============================================================================

  /**
   * Simple FAQ lookup using RAG (Retrieval-Augmented Generation)
   *
   * Uses vector search with minimal context window. Suitable for
   * straightforward questions with clear answers in the knowledge base.
   *
   * **Estimated tokens**: 500-1000
   *
   * **Example use cases**:
   * - "What are your business hours?"
   * - "How do I reset my password?"
   * - "What payment methods do you accept?"
   */
  faq_simple: 5,

  /**
   * Complex FAQ query with multi-step reasoning
   *
   * Uses larger context window and may require multiple RAG retrievals
   * or chain-of-thought reasoning. Suitable for nuanced questions.
   *
   * **Estimated tokens**: 2000-4000
   *
   * **Example use cases**:
   * - "Compare your pricing plans and recommend one for my use case"
   * - "Explain the difference between your two authentication methods"
   * - "How does your refund policy work for annual subscriptions?"
   */
  faq_complex: 15,

  // ============================================================================
  // Chat Operations
  // ============================================================================

  /**
   * Standard chat message (non-streaming)
   *
   * Single message exchange with context window up to 4000 tokens.
   * Suitable for most conversational interactions.
   *
   * **Estimated tokens**: 1000-4000
   *
   * **Example use cases**:
   * - General conversation
   * - Question answering
   * - Content suggestions
   */
  chat_message: 15,

  /**
   * Streaming chat message
   *
   * Real-time token streaming with same context as standard messages.
   * Higher cost due to streaming infrastructure and perceived value.
   *
   * **Estimated tokens**: 1000-4000
   *
   * **Example use cases**:
   * - Interactive chat experiences
   * - Real-time content generation
   * - Live coding assistance
   */
  chat_streaming: 20,

  /**
   * Chat with tool/function calling
   *
   * Chat message that can invoke external tools, APIs, or functions.
   * Includes extra tokens for tool definitions and result processing.
   *
   * **Estimated tokens**: 2000-6000
   *
   * **Example use cases**:
   * - Database queries via chat
   * - API integrations
   * - Calculator or data lookups
   */
  chat_with_tools: 30,

  /**
   * Image analysis in chat (Vision)
   *
   * Multimodal chat message with one or more images for visual analysis.
   * Higher cost due to image processing tokens (images consume ~85 tokens
   * per 512x512 tile in most providers).
   *
   * **Estimated tokens**: 2000-8000 (depends on image resolution)
   *
   * **Example use cases**:
   * - "What's in this image?"
   * - Screenshot analysis and debugging
   * - Document/receipt scanning via chat
   * - Design feedback and comparison
   */
  image_analysis: 30,

  /**
   * PDF document analysis in chat
   *
   * Upload and analyze PDF documents in the LLM Chat.
   * Server-side text extraction with pdf-parse, then AI analysis.
   * Higher cost than streaming due to extraction overhead.
   *
   * **Estimated tokens**: 3000-10000 (depends on document length)
   *
   * **Example use cases**:
   * - "Summarize this contract"
   * - "What are the key terms in this PDF?"
   * - "Extract the action items from this meeting notes PDF"
   */
  pdf_analysis: 40,

  // ============================================================================
  // Advanced AI Operations
  // ============================================================================

  /**
   * Image generation from text prompt
   *
   * Text-to-image generation using models like DALL-E or Stable Diffusion.
   * Highest single-operation cost due to computational requirements.
   *
   * **Estimated tokens**: N/A (GPU-based operation)
   *
   * **Example use cases**:
   * - Marketing visual generation
   * - Product mockups
   * - Concept art creation
   */
  image_gen: 80,

  /**
   * Image editing/manipulation
   *
   * Modify existing images using text prompts or masks.
   * Includes inpainting, outpainting, and style transfer.
   *
   * **Estimated tokens**: N/A (GPU-based operation)
   *
   * **Example use cases**:
   * - Background removal
   * - Object replacement
   * - Image enhancement
   */
  image_edit: 50,

  /**
   * Code analysis and review
   *
   * Static analysis, bug detection, and code quality assessment.
   * Analyzes code structure, patterns, and potential issues.
   *
   * **Estimated tokens**: 3000-8000
   *
   * **Example use cases**:
   * - Security vulnerability scanning
   * - Performance optimization suggestions
   * - Code smell detection
   */
  code_analysis: 40,

  /**
   * Code generation from specifications
   *
   * Generate complete code files or functions from natural language
   * descriptions. Includes language-specific syntax and best practices.
   *
   * **Estimated tokens**: 4000-10000
   *
   * **Example use cases**:
   * - Component scaffolding
   * - API endpoint generation
   * - Test case creation
   */
  code_gen: 50,

  // ============================================================================
  // Embeddings and Vector Operations
  // ============================================================================

  /**
   * Single text embedding generation
   *
   * Convert text to vector representation for semantic search.
   * Typically 1536-dimensional vector (OpenAI ada-002).
   *
   * **Estimated tokens**: 100-500
   *
   * **Example use cases**:
   * - Document indexing
   * - Semantic search preparation
   * - Content similarity calculation
   */
  embedding_single: 5,

  /**
   * Batch embedding generation
   *
   * Process multiple texts in a single batch operation.
   * More efficient than individual embeddings for bulk operations.
   *
   * **Estimated tokens**: 1000-5000
   *
   * **Example use cases**:
   * - Bulk document processing
   * - Knowledge base initialization
   * - Large-scale content indexing
   */
  embedding_batch: 10,

  /**
   * Vector similarity search
   *
   * Query vector database to find semantically similar content.
   * Cost covers embedding query text and database lookup.
   *
   * **Estimated tokens**: 200-800
   *
   * **Example use cases**:
   * - Semantic document search
   * - Recommendation systems
   * - Duplicate content detection
   */
  vector_search: 5,

  // ============================================================================
  // Audio Operations
  // ============================================================================

  /**
   * Audio transcription (speech-to-text)
   *
   * Convert audio files to text using Whisper or similar models.
   * Cost per minute of audio content.
   *
   * **Estimated tokens**: N/A (audio processing)
   *
   * **Example use cases**:
   * - Meeting transcription
   * - Podcast notes generation
   * - Voice command processing
   */
  transcription: 30,

  /**
   * Speech-to-text for chat voice input
   *
   * Short audio recordings from microphone input in LLM Chat,
   * transcribed via OpenAI Whisper. Lower cost than general transcription
   * because chat recordings are typically shorter (max 120s).
   *
   * **Estimated tokens**: N/A (audio processing)
   *
   * **Example use cases**:
   * - Voice input in chat (microphone button)
   * - Quick voice messages for AI conversation
   */
  speech_to_text: 20,

  /**
   * Text-to-speech synthesis
   *
   * Generate natural-sounding audio from text input.
   * Includes voice selection and audio quality options.
   *
   * **Estimated tokens**: N/A (audio synthesis)
   *
   * **Example use cases**:
   * - Voiceover generation
   * - Accessibility features
   * - Audio content creation
   */
  tts: 20,

  // ============================================================================
  // Document Processing
  // ============================================================================

  /**
   * PDF parsing and text extraction
   *
   * Extract text, tables, and metadata from PDF documents.
   * Handles multi-page documents with layout preservation.
   *
   * **Estimated tokens**: 1000-3000
   *
   * **Example use cases**:
   * - Document digitization
   * - Invoice processing
   * - Contract analysis
   */
  pdf_parse: 15,

  /**
   * Optical Character Recognition (OCR)
   *
   * Extract text from images and scanned documents.
   * Includes text detection, recognition, and layout analysis.
   *
   * **Estimated tokens**: N/A (image processing)
   *
   * **Example use cases**:
   * - Receipt scanning
   * - Handwriting recognition
   * - Screenshot text extraction
   */
  ocr: 30,

  /**
   * Document summarization
   *
   * Generate concise summaries of long documents.
   * Uses extractive or abstractive summarization techniques.
   *
   * **Estimated tokens**: 5000-12000
   *
   * **Example use cases**:
   * - Research paper summaries
   * - Meeting notes condensation
   * - Article key points extraction
   */
  document_summary: 65,

  // ============================================================================
  // Content Generation
  // ============================================================================

  /**
   * Template-based content generation
   *
   * Generate text from templates (email, product description, blog outline,
   * social media post, marketing copy) with streaming output.
   * Cost covers template processing + text generation.
   *
   * **Estimated tokens**: 1000-4000
   *
   * **Example use cases**:
   * - Professional email drafting
   * - Product description writing
   * - Blog post outline generation
   * - Social media post creation
   * - Marketing copy generation
   */
  content_generation: 25,

Full Cost Table

Operation	Credits	Estimated Tokens	Category
`faq_simple`	5	500-1,000	FAQ
`faq_complex`	15	2,000-4,000	FAQ
`chat_message`	15	1,000-4,000	Chat
`chat_streaming`	20	1,000-4,000	Chat
`content_generation`	25	1,000-4,000	Content
`chat_with_tools`	30	2,000-6,000	Chat
`image_analysis`	30	2,000-8,000	Chat
`pdf_analysis`	40	3,000-10,000	Chat
`image_gen`	80	N/A (GPU)	Advanced AI
`image_edit`	50	N/A (GPU)	Advanced AI
`code_analysis`	40	3,000-8,000	Advanced AI
`code_gen`	50	4,000-10,000	Advanced AI
`embedding_single`	5	100-500	Embeddings
`embedding_batch`	10	1,000-5,000	Embeddings
`vector_search`	5	200-800	Embeddings
`transcription`	30	N/A (audio)	Audio
`tts`	20	N/A (audio)	Audio
`speech_to_text`	20	N/A (audio)	Audio
`pdf_parse`	15	1,000-3,000	Document
`ocr`	30	N/A (image)	Document
`document_summary`	65	5,000-12,000	Document

Add your own operations to CREDIT_COSTS in apps/boilerplate/src/lib/credits/credit-costs.ts. The type system auto-generates a CreditOperation union type from the object keys — new operations get type safety automatically.

Credit Cost Utilities

The credit costs module provides helper functions:

Function	Purpose
`getCreditCost(operation)`	Get cost for a single operation
`calculateBatchCost(operation, quantity)`	Calculate total cost for batch operations
`getAllCreditCosts()`	Get all costs as a plain object (for admin UI)
`isValidOperation(string)`	Type guard — check if a string is a valid operation
`getOperationsByCategory()`	Group operations by category (faq, chat, etc.)
`estimateOperationCount(operation, credits)`	How many operations can X credits afford?
`formatCreditAmount(credits, includeUnit)`	Format for display (`20` → `"20 credits"`)

Usage Tracking

Every AI request is tracked to the AIUsage database table for analytics and cost monitoring. Tracking is non-blocking — failures are logged but never prevent the AI response from being delivered.

TrackUsageParams

typescript

interface TrackUsageParams {
  userId?: string
  sessionId?: string
  provider: string        // "openai", "anthropic", etc.
  model: string           // "gpt-5-nano", "claude-haiku", etc.
  tokens: number          // Total tokens used
  cost?: TokenCost | number  // USD cost (from provider pricing)
  purpose: 'faq' | 'chat' | 'completion' | 'stream' | 'embedding' | 'general'
  metadata?: Record<string, unknown>  // Additional context
}

Monthly Aggregation

Usage is aggregated monthly for quota checks and analytics:

typescript

interface MonthlyUsage {
  totalTokens: number
  totalCost: number
  requestCount: number
  byProvider: Record<string, { tokens, cost, requests }>
  byPurpose: Record<string, { tokens, cost, requests }>
}

Usage API

Endpoint	Method	Purpose
`/api/ai/usage`	GET	Current month's usage statistics

The usage endpoint returns aggregated data broken down by provider and purpose, suitable for dashboard charts and usage meters.

Token Cost Calculation

Kit calculates the USD cost of each request using per-model pricing tables. The calculateCost method on BaseProvider converts token counts to dollar amounts:

src/lib/ai/providers/base-provider.ts — Cost Calculation

calculateCost(usage: TokenUsage, model?: string): TokenCost {
    const modelInfo = this.getModelInfo(model ?? this.defaultModel)
    if (!modelInfo) {
      return {
        promptCost: 0,
        completionCost: 0,
        totalCost: 0,
        currency: 'USD',
      }
    }

    const promptCost =
      (usage.promptTokens / 1_000_000) * modelInfo.costPerMillionPromptTokens
    const completionCost =
      (usage.completionTokens / 1_000_000) *
      modelInfo.costPerMillionCompletionTokens

    return {
      promptCost,
      completionCost,
      totalCost: promptCost + completionCost,
      currency: 'USD',
    }
  }

The calculation uses each model's costPerMillionPromptTokens and costPerMillionCompletionTokens from the model info registry. This provides accurate cost tracking across all four providers.

Example: A request using claude-haiku-4-5 with 500 prompt tokens and 200 completion tokens:

Prompt cost:     (500 / 1,000,000) × $0.80 = $0.000400
Completion cost: (200 / 1,000,000) × $4.00 = $0.000800
Total cost:      $0.001200

Feature Flag Integration

The cost management system behaves differently based on whether the credit system is enabled:

Behavior	Credit System ON	Credit System OFF
Rate Limiting	Global burst + credit balance check	Global burst only
Credit Deduction	Atomic deduction before processing	Skipped
Usage Tracking	Full tracking with costs	Full tracking (analytics only)
Monthly Limit	Based on credit balance	Unlimited (999999)
402 Errors	"Insufficient credits"	Never sent
Auto-Reset	Checks if 30+ days elapsed	Skipped

When NEXT_PUBLIC_PRICING_MODEL=classic_saas, the credit system is disabled. Users get unlimited AI requests (no credit checks), but global burst protection still applies. Usage tracking continues for analytics purposes. This is suitable for traditional subscription models where AI is an included feature rather than a metered resource.

The feature flag check happens inside checkRateLimit:

checkRateLimit()
    |
    |--- ALWAYS: Check global rate limit
    |
    |--- isCreditSystemEnabled()?
    |    |
    |    |--- YES: Check credit balance, auto-reset, deduct
    |    |--- NO:  Allow request (log "Credit system disabled")

Environment Variables

Variable	Default	Purpose
`AI_RATE_LIMIT_WINDOW`	`10`	Global rate limit window in seconds
`AI_RATE_LIMIT_MAX_REQUESTS`	`10`	Max requests per global window
`AI_FREE_TIER_REQUESTS`	`500`	Monthly limit for free tier
`AI_BASIC_TIER_REQUESTS`	`1500`	Monthly limit for basic tier
`AI_PRO_TIER_REQUESTS`	`5000`	Monthly limit for pro tier
`AI_ENTERPRISE_TIER_REQUESTS`	`15000`	Monthly limit for enterprise tier
`UPSTASH_REDIS_REST_URL`	—	Redis URL for rate limiting
`UPSTASH_REDIS_REST_TOKEN`	—	Redis token for rate limiting
`NEXT_PUBLIC_PRICING_MODEL`	`credit_based`	`credit_based` or `classic_saas`

Key Files

File	Purpose
`apps/boilerplate/src/lib/ai/rate-limiter.ts`	Global burst and tier-based rate limiting
`apps/boilerplate/src/lib/credits/credit-costs.ts`	Per-operation credit costs (21 operations)
`apps/boilerplate/src/lib/credits/credit-manager.ts`	Atomic credit deductions with `SELECT FOR UPDATE`
`apps/boilerplate/src/lib/credits/config.ts`	Credit system feature flag (`isCreditSystemEnabled()`)
`apps/boilerplate/src/lib/ai/usage-tracker.ts`	Non-blocking usage tracking to database
`apps/boilerplate/src/lib/ai/providers/base-provider.ts`	Token cost calculation per provider
`apps/boilerplate/src/app/api/ai/usage/route.ts`	Usage statistics endpoint

Three-Layer Architecture

Rate Limiting

Global Burst Protection

Tier-Based Monthly Quotas

Rate Limit Check Flow

Credit Costs

Full Cost Table

Credit Cost Utilities

Usage Tracking

TrackUsageParams

Monthly Aggregation

Usage API

Token Cost Calculation

Feature Flag Integration

Environment Variables

Key Files

Related