Kit protects your AI budget with a three-layer cost management system: global rate limiting for burst protection, a credit-based system for per-operation billing, and usage tracking for analytics and cost monitoring. All three layers work together — every AI request must pass through each layer before reaching the provider.
This page covers the rate limiting architecture, credit costs, usage tracking, token cost calculation, and feature flag integration. For the credit system's billing integration, see Credit System.
Three-Layer Architecture
Every AI request passes through three protection layers in sequence:
Incoming AI Request
|
v
Layer 1: Global Rate Limit (Burst Protection)
|--- Upstash Redis sliding window
|--- 10 requests per 10 seconds (configurable)
|--- Applies to ALL users regardless of tier
|--- Purpose: Prevent DDoS and burst abuse
|--- Fail: 429 "Too many requests. Please slow down."
|
v
Layer 2: Credit System (Per-Operation Billing)
|--- Check credit balance in database
|--- Verify sufficient credits for operation
|--- Auto-reset if 30+ days elapsed (webhook backup)
|--- Atomic deduction BEFORE processing
|--- Fail: 402 "Insufficient credits"
|
v
Layer 3: Usage Tracking (Analytics)
|--- Non-blocking database write (after response)
|--- Records: provider, model, tokens, cost, purpose
|--- Monthly aggregation for quota checks
|--- Used for billing analytics and dashboards
|
v
Request reaches AI Provider
Rate limiting and credit checks use a fail-open strategy. If Redis is unavailable or a database query fails, the request is allowed with a warning logged. This prevents infrastructure issues from blocking all users. Monitor your logs for
⚠️ Rate limiting disabled warnings.Rate Limiting
Global Burst Protection
The global rate limiter uses Upstash Redis with a sliding window algorithm. It applies to all users and prevents burst abuse:
bash
# Configure via environment variables
AI_RATE_LIMIT_WINDOW=10 # Window in seconds (default: 10)
AI_RATE_LIMIT_MAX_REQUESTS=10 # Max requests per window (default: 10)
Key features:
- Sliding window — smoother than fixed windows, no burst at window boundaries
- Ephemeral cache — in-memory cache reduces Redis calls by 50-80%
- Analytics — Upstash analytics enabled for monitoring
Tier-Based Monthly Quotas
Each subscription tier has a monthly request limit, enforced via separate Redis rate limiters:
| Tier | Monthly Limit | Window | Env Override |
|---|---|---|---|
| Free | 500 | 30 days | AI_FREE_TIER_REQUESTS |
| Basic | 1,500 | 30 days | AI_BASIC_TIER_REQUESTS |
| Pro | 5,000 | 30 days | AI_PRO_TIER_REQUESTS |
| Enterprise | 15,000 | 30 days | AI_ENTERPRISE_TIER_REQUESTS |
The user's tier is determined from their Lemon Squeezy subscription variant ID. Users without a subscription default to the Free tier.
Rate Limit Check Flow
The comprehensive
checkRateLimit function orchestrates all three checks. It is called by every AI API route:src/lib/ai/rate-limiter.ts — Comprehensive Rate Limit Check
export async function checkRateLimit(params: {
userId?: string
sessionId?: string
ip?: string
cost?: number // Credit cost for operation (default: 1)
}): Promise<{
success: boolean
limit: number
remaining: number
reset: number
tier: SubscriptionTier
reason?: string
creditSystemEnabled: boolean
}> {
const { userId, sessionId, ip, cost = 1 } = params
// Get identifier for burst protection
const identifier = getIdentifier(userId, sessionId, ip)
// STEP 1: ALWAYS check global rate limit (DDoS/Burst protection)
const globalResult = await checkGlobalRateLimit(identifier)
if (!globalResult.success) {
return {
success: false,
limit: globalResult.limit,
remaining: globalResult.remaining,
reset: globalResult.reset,
tier: 'free',
reason: 'Too many requests. Please slow down.',
creditSystemEnabled: isCreditSystemEnabled(),
}
}
// STEP 2: Check if credit system is enabled
if (!isCreditSystemEnabled()) {
console.log('[Rate Limit] Credit system disabled - allowing request')
return {
success: true,
limit: 999999, // Unlimited
remaining: 999999,
reset: Date.now() + 30 * 24 * 60 * 60 * 1000,
tier: userId ? await getUserTier(userId) : 'free',
creditSystemEnabled: false,
}
}
The function returns a standardized result:
typescript
{
success: boolean // true if all checks passed
limit: number // Total credit/request limit
remaining: number // Remaining credits/requests
reset: number // Unix timestamp for reset
tier: SubscriptionTier // User's subscription tier
reason?: string // Human-readable error message
creditSystemEnabled: boolean // Whether credit system is active
}
Every AI API route MUST call
checkRateLimit() before processing. Skipping this check means no burst protection, no credit validation, and no auto-reset — leaving your AI budget unprotected.Credit Costs
Every AI operation has a defined credit cost. Costs are based on estimated token usage and computational complexity:
src/lib/credits/credit-costs.ts — Operation Costs
export const CREDIT_COSTS = {
// ============================================================================
// FAQ Operations
// ============================================================================
/**
* Simple FAQ lookup using RAG (Retrieval-Augmented Generation)
*
* Uses vector search with minimal context window. Suitable for
* straightforward questions with clear answers in the knowledge base.
*
* **Estimated tokens**: 500-1000
*
* **Example use cases**:
* - "What are your business hours?"
* - "How do I reset my password?"
* - "What payment methods do you accept?"
*/
faq_simple: 5,
/**
* Complex FAQ query with multi-step reasoning
*
* Uses larger context window and may require multiple RAG retrievals
* or chain-of-thought reasoning. Suitable for nuanced questions.
*
* **Estimated tokens**: 2000-4000
*
* **Example use cases**:
* - "Compare your pricing plans and recommend one for my use case"
* - "Explain the difference between your two authentication methods"
* - "How does your refund policy work for annual subscriptions?"
*/
faq_complex: 15,
// ============================================================================
// Chat Operations
// ============================================================================
/**
* Standard chat message (non-streaming)
*
* Single message exchange with context window up to 4000 tokens.
* Suitable for most conversational interactions.
*
* **Estimated tokens**: 1000-4000
*
* **Example use cases**:
* - General conversation
* - Question answering
* - Content suggestions
*/
chat_message: 15,
/**
* Streaming chat message
*
* Real-time token streaming with same context as standard messages.
* Higher cost due to streaming infrastructure and perceived value.
*
* **Estimated tokens**: 1000-4000
*
* **Example use cases**:
* - Interactive chat experiences
* - Real-time content generation
* - Live coding assistance
*/
chat_streaming: 20,
/**
* Chat with tool/function calling
*
* Chat message that can invoke external tools, APIs, or functions.
* Includes extra tokens for tool definitions and result processing.
*
* **Estimated tokens**: 2000-6000
*
* **Example use cases**:
* - Database queries via chat
* - API integrations
* - Calculator or data lookups
*/
chat_with_tools: 30,
/**
* Image analysis in chat (Vision)
*
* Multimodal chat message with one or more images for visual analysis.
* Higher cost due to image processing tokens (images consume ~85 tokens
* per 512x512 tile in most providers).
*
* **Estimated tokens**: 2000-8000 (depends on image resolution)
*
* **Example use cases**:
* - "What's in this image?"
* - Screenshot analysis and debugging
* - Document/receipt scanning via chat
* - Design feedback and comparison
*/
image_analysis: 30,
/**
* PDF document analysis in chat
*
* Upload and analyze PDF documents in the LLM Chat.
* Server-side text extraction with pdf-parse, then AI analysis.
* Higher cost than streaming due to extraction overhead.
*
* **Estimated tokens**: 3000-10000 (depends on document length)
*
* **Example use cases**:
* - "Summarize this contract"
* - "What are the key terms in this PDF?"
* - "Extract the action items from this meeting notes PDF"
*/
pdf_analysis: 40,
// ============================================================================
// Advanced AI Operations
// ============================================================================
/**
* Image generation from text prompt
*
* Text-to-image generation using models like DALL-E or Stable Diffusion.
* Highest single-operation cost due to computational requirements.
*
* **Estimated tokens**: N/A (GPU-based operation)
*
* **Example use cases**:
* - Marketing visual generation
* - Product mockups
* - Concept art creation
*/
image_gen: 80,
/**
* Image editing/manipulation
*
* Modify existing images using text prompts or masks.
* Includes inpainting, outpainting, and style transfer.
*
* **Estimated tokens**: N/A (GPU-based operation)
*
* **Example use cases**:
* - Background removal
* - Object replacement
* - Image enhancement
*/
image_edit: 50,
/**
* Code analysis and review
*
* Static analysis, bug detection, and code quality assessment.
* Analyzes code structure, patterns, and potential issues.
*
* **Estimated tokens**: 3000-8000
*
* **Example use cases**:
* - Security vulnerability scanning
* - Performance optimization suggestions
* - Code smell detection
*/
code_analysis: 40,
/**
* Code generation from specifications
*
* Generate complete code files or functions from natural language
* descriptions. Includes language-specific syntax and best practices.
*
* **Estimated tokens**: 4000-10000
*
* **Example use cases**:
* - Component scaffolding
* - API endpoint generation
* - Test case creation
*/
code_gen: 50,
// ============================================================================
// Embeddings and Vector Operations
// ============================================================================
/**
* Single text embedding generation
*
* Convert text to vector representation for semantic search.
* Typically 1536-dimensional vector (OpenAI ada-002).
*
* **Estimated tokens**: 100-500
*
* **Example use cases**:
* - Document indexing
* - Semantic search preparation
* - Content similarity calculation
*/
embedding_single: 5,
/**
* Batch embedding generation
*
* Process multiple texts in a single batch operation.
* More efficient than individual embeddings for bulk operations.
*
* **Estimated tokens**: 1000-5000
*
* **Example use cases**:
* - Bulk document processing
* - Knowledge base initialization
* - Large-scale content indexing
*/
embedding_batch: 10,
/**
* Vector similarity search
*
* Query vector database to find semantically similar content.
* Cost covers embedding query text and database lookup.
*
* **Estimated tokens**: 200-800
*
* **Example use cases**:
* - Semantic document search
* - Recommendation systems
* - Duplicate content detection
*/
vector_search: 5,
// ============================================================================
// Audio Operations
// ============================================================================
/**
* Audio transcription (speech-to-text)
*
* Convert audio files to text using Whisper or similar models.
* Cost per minute of audio content.
*
* **Estimated tokens**: N/A (audio processing)
*
* **Example use cases**:
* - Meeting transcription
* - Podcast notes generation
* - Voice command processing
*/
transcription: 30,
/**
* Speech-to-text for chat voice input
*
* Short audio recordings from microphone input in LLM Chat,
* transcribed via OpenAI Whisper. Lower cost than general transcription
* because chat recordings are typically shorter (max 120s).
*
* **Estimated tokens**: N/A (audio processing)
*
* **Example use cases**:
* - Voice input in chat (microphone button)
* - Quick voice messages for AI conversation
*/
speech_to_text: 20,
/**
* Text-to-speech synthesis
*
* Generate natural-sounding audio from text input.
* Includes voice selection and audio quality options.
*
* **Estimated tokens**: N/A (audio synthesis)
*
* **Example use cases**:
* - Voiceover generation
* - Accessibility features
* - Audio content creation
*/
tts: 20,
// ============================================================================
// Document Processing
// ============================================================================
/**
* PDF parsing and text extraction
*
* Extract text, tables, and metadata from PDF documents.
* Handles multi-page documents with layout preservation.
*
* **Estimated tokens**: 1000-3000
*
* **Example use cases**:
* - Document digitization
* - Invoice processing
* - Contract analysis
*/
pdf_parse: 15,
/**
* Optical Character Recognition (OCR)
*
* Extract text from images and scanned documents.
* Includes text detection, recognition, and layout analysis.
*
* **Estimated tokens**: N/A (image processing)
*
* **Example use cases**:
* - Receipt scanning
* - Handwriting recognition
* - Screenshot text extraction
*/
ocr: 30,
/**
* Document summarization
*
* Generate concise summaries of long documents.
* Uses extractive or abstractive summarization techniques.
*
* **Estimated tokens**: 5000-12000
*
* **Example use cases**:
* - Research paper summaries
* - Meeting notes condensation
* - Article key points extraction
*/
document_summary: 65,
// ============================================================================
// Content Generation
// ============================================================================
/**
* Template-based content generation
*
* Generate text from templates (email, product description, blog outline,
* social media post, marketing copy) with streaming output.
* Cost covers template processing + text generation.
*
* **Estimated tokens**: 1000-4000
*
* **Example use cases**:
* - Professional email drafting
* - Product description writing
* - Blog post outline generation
* - Social media post creation
* - Marketing copy generation
*/
content_generation: 25,
Full Cost Table
| Operation | Credits | Estimated Tokens | Category |
|---|---|---|---|
faq_simple | 5 | 500-1,000 | FAQ |
faq_complex | 15 | 2,000-4,000 | FAQ |
chat_message | 15 | 1,000-4,000 | Chat |
chat_streaming | 20 | 1,000-4,000 | Chat |
content_generation | 25 | 1,000-4,000 | Content |
chat_with_tools | 30 | 2,000-6,000 | Chat |
image_analysis | 30 | 2,000-8,000 | Chat |
pdf_analysis | 40 | 3,000-10,000 | Chat |
image_gen | 80 | N/A (GPU) | Advanced AI |
image_edit | 50 | N/A (GPU) | Advanced AI |
code_analysis | 40 | 3,000-8,000 | Advanced AI |
code_gen | 50 | 4,000-10,000 | Advanced AI |
embedding_single | 5 | 100-500 | Embeddings |
embedding_batch | 10 | 1,000-5,000 | Embeddings |
vector_search | 5 | 200-800 | Embeddings |
transcription | 30 | N/A (audio) | Audio |
tts | 20 | N/A (audio) | Audio |
speech_to_text | 20 | N/A (audio) | Audio |
pdf_parse | 15 | 1,000-3,000 | Document |
ocr | 30 | N/A (image) | Document |
document_summary | 65 | 5,000-12,000 | Document |
Add your own operations to
CREDIT_COSTS in apps/boilerplate/src/lib/credits/credit-costs.ts. The type system auto-generates a CreditOperation union type from the object keys — new operations get type safety automatically.Credit Cost Utilities
The credit costs module provides helper functions:
| Function | Purpose |
|---|---|
getCreditCost(operation) | Get cost for a single operation |
calculateBatchCost(operation, quantity) | Calculate total cost for batch operations |
getAllCreditCosts() | Get all costs as a plain object (for admin UI) |
isValidOperation(string) | Type guard — check if a string is a valid operation |
getOperationsByCategory() | Group operations by category (faq, chat, etc.) |
estimateOperationCount(operation, credits) | How many operations can X credits afford? |
formatCreditAmount(credits, includeUnit) | Format for display (20 → "20 credits") |
Usage Tracking
Every AI request is tracked to the
AIUsage database table for analytics and cost monitoring. Tracking is non-blocking — failures are logged but never prevent the AI response from being delivered.TrackUsageParams
typescript
interface TrackUsageParams {
userId?: string
sessionId?: string
provider: string // "openai", "anthropic", etc.
model: string // "gpt-5-nano", "claude-haiku", etc.
tokens: number // Total tokens used
cost?: TokenCost | number // USD cost (from provider pricing)
purpose: 'faq' | 'chat' | 'completion' | 'stream' | 'embedding' | 'general'
metadata?: Record<string, unknown> // Additional context
}
Monthly Aggregation
Usage is aggregated monthly for quota checks and analytics:
typescript
interface MonthlyUsage {
totalTokens: number
totalCost: number
requestCount: number
byProvider: Record<string, { tokens, cost, requests }>
byPurpose: Record<string, { tokens, cost, requests }>
}
Usage API
| Endpoint | Method | Purpose |
|---|---|---|
/api/ai/usage | GET | Current month's usage statistics |
The usage endpoint returns aggregated data broken down by provider and purpose, suitable for dashboard charts and usage meters.
Token Cost Calculation
Kit calculates the USD cost of each request using per-model pricing tables. The
calculateCost method on BaseProvider converts token counts to dollar amounts:src/lib/ai/providers/base-provider.ts — Cost Calculation
calculateCost(usage: TokenUsage, model?: string): TokenCost {
const modelInfo = this.getModelInfo(model ?? this.defaultModel)
if (!modelInfo) {
return {
promptCost: 0,
completionCost: 0,
totalCost: 0,
currency: 'USD',
}
}
const promptCost =
(usage.promptTokens / 1_000_000) * modelInfo.costPerMillionPromptTokens
const completionCost =
(usage.completionTokens / 1_000_000) *
modelInfo.costPerMillionCompletionTokens
return {
promptCost,
completionCost,
totalCost: promptCost + completionCost,
currency: 'USD',
}
}
The calculation uses each model's
costPerMillionPromptTokens and costPerMillionCompletionTokens from the model info registry. This provides accurate cost tracking across all four providers.Example: A request using
claude-haiku-4-5 with 500 prompt tokens and 200 completion tokens:Prompt cost: (500 / 1,000,000) × $0.80 = $0.000400
Completion cost: (200 / 1,000,000) × $4.00 = $0.000800
Total cost: $0.001200
Feature Flag Integration
The cost management system behaves differently based on whether the credit system is enabled:
| Behavior | Credit System ON | Credit System OFF |
|---|---|---|
| Rate Limiting | Global burst + credit balance check | Global burst only |
| Credit Deduction | Atomic deduction before processing | Skipped |
| Usage Tracking | Full tracking with costs | Full tracking (analytics only) |
| Monthly Limit | Based on credit balance | Unlimited (999999) |
| 402 Errors | "Insufficient credits" | Never sent |
| Auto-Reset | Checks if 30+ days elapsed | Skipped |
When
NEXT_PUBLIC_PRICING_MODEL=classic_saas, the credit system is disabled. Users get unlimited AI requests (no credit checks), but global burst protection still applies. Usage tracking continues for analytics purposes. This is suitable for traditional subscription models where AI is an included feature rather than a metered resource.The feature flag check happens inside
checkRateLimit:checkRateLimit()
|
|--- ALWAYS: Check global rate limit
|
|--- isCreditSystemEnabled()?
| |
| |--- YES: Check credit balance, auto-reset, deduct
| |--- NO: Allow request (log "Credit system disabled")
Environment Variables
| Variable | Default | Purpose |
|---|---|---|
AI_RATE_LIMIT_WINDOW | 10 | Global rate limit window in seconds |
AI_RATE_LIMIT_MAX_REQUESTS | 10 | Max requests per global window |
AI_FREE_TIER_REQUESTS | 500 | Monthly limit for free tier |
AI_BASIC_TIER_REQUESTS | 1500 | Monthly limit for basic tier |
AI_PRO_TIER_REQUESTS | 5000 | Monthly limit for pro tier |
AI_ENTERPRISE_TIER_REQUESTS | 15000 | Monthly limit for enterprise tier |
UPSTASH_REDIS_REST_URL | — | Redis URL for rate limiting |
UPSTASH_REDIS_REST_TOKEN | — | Redis token for rate limiting |
NEXT_PUBLIC_PRICING_MODEL | credit_based | credit_based or classic_saas |
Key Files
| File | Purpose |
|---|---|
apps/boilerplate/src/lib/ai/rate-limiter.ts | Global burst and tier-based rate limiting |
apps/boilerplate/src/lib/credits/credit-costs.ts | Per-operation credit costs (21 operations) |
apps/boilerplate/src/lib/credits/credit-manager.ts | Atomic credit deductions with SELECT FOR UPDATE |
apps/boilerplate/src/lib/credits/config.ts | Credit system feature flag (isCreditSystemEnabled()) |
apps/boilerplate/src/lib/ai/usage-tracker.ts | Non-blocking usage tracking to database |
apps/boilerplate/src/lib/ai/providers/base-provider.ts | Token cost calculation per provider |
apps/boilerplate/src/app/api/ai/usage/route.ts | Usage statistics endpoint |