Chat System

Streaming SSE protocol, React hooks, API routes, chat components, and quick prompts

Kit provides a complete chat system with two modes — LLM Chat for direct AI conversations and RAG Chat for knowledge-base-powered responses. Both use streaming SSE (Server-Sent Events) for real-time token delivery, share a common component library, and integrate with the credit system.
This page covers the streaming protocol, API routes, React hooks, and chat components. For the RAG-specific pipeline, see RAG System.

Two Chat Modes

AspectLLM ChatRAG Chat
Dashboard Route/dashboard/chat-llm/dashboard/chat-rag
API Route/api/ai/stream (SSE), /api/ai/chat (JSON)/api/ai/rag/ask
HookuseAIChat()Custom RAG hook
Context SourceConversation history onlypgvector search + conversation
Feature FlagNEXT_PUBLIC_AI_LLM_CHAT_ENABLEDNEXT_PUBLIC_AI_RAG_CHAT_ENABLED
Credit Cost20 credits (streaming), 30 (image analysis), 40 (PDF analysis), 20 (voice input), 15 (sync)15 credits
Auth RequiredYes (Clerk)Yes (Clerk)

Vision Chat (Image Analysis)

When NEXT_PUBLIC_AI_VISION_ENABLED=true (default) and LLM Chat is enabled, users can attach images to messages for AI analysis. Vision Chat adds image upload capabilities to the existing LLM Chat interface.
Upload Methods: Drag & drop onto chat area, paste from clipboard (Ctrl+V), or file picker button.
Constraints:
ConstraintValue
Max image size4.5 MB per image
Max images per message4
Accepted typesPNG, JPEG, WebP, GIF
Images are encoded as Base64 data URIs and sent as ContentPart[] in the message content field:
json
{
  "messages": [{
    "role": "user",
    "content": [
      { "type": "image", "image": "data:image/png;base64,..." },
      { "type": "text", "text": "Describe this image" }
    ]
  }]
}
The stream route auto-detects image content and selects the image_analysis credit operation (30 credits) instead of chat_streaming (20 credits). The core Message.content type stays string for backward compatibility — multimodal ContentPart[] is handled at the API boundary only.

PDF Chat (Document Analysis)

When NEXT_PUBLIC_AI_PDF_CHAT_ENABLED=true (default) and LLM Chat is enabled, users can attach PDF documents to messages for AI analysis. PDF Chat uses server-side text extraction with pdf-parse — no vision API required, so it works with all providers (OpenAI, Anthropic, Google, xAI).
Upload Methods: Drag & drop onto chat area or file picker button (paperclip icon).
Constraints:
ConstraintValue
Max file size10 MB per PDF
Max PDFs per message1
Accepted typesPDF only (.pdf)
Max extracted text50,000 characters
PDFs are read as ArrayBuffer, sent to the server as Base64, and extracted server-side. The extracted text is prepended to the user's message as context:
json
{
  "messages": [{
    "role": "user",
    "content": "--- PDF Document: report.pdf ---\n[extracted text]\n--- End PDF ---\n\nSummarize this document"
  }],
  "pdfAttached": true
}
The stream route auto-detects pdfAttached: true and selects the pdf_analysis credit operation (40 credits) instead of chat_streaming (20 credits).

Audio Input (Speech-to-Text)

When NEXT_PUBLIC_AI_AUDIO_INPUT_ENABLED=true (default) and LLM Chat is enabled, a microphone button appears in the chat input area. Users can record voice messages which are transcribed via the OpenAI Whisper API and inserted into the chat input field.
Recording Flow: Press microphone button → grant permission → record (with live audio level visualization and timer) → press stop → audio is transcribed → text appears in input field.
Constraints:
ConstraintValue
Max recording duration120 seconds
Audio formatWebM (preferred), WAV (fallback)
Max file size25 MB
Transcription modelWhisper (whisper-1)
The recorded audio is sent as multipart/form-data to /api/ai/speech-to-text, which forwards it to the OpenAI Whisper API. The transcribed text is returned with language detection and duration metadata.
Credit cost: 20 credits per transcription (speech_to_text operation). Credits are deducted before the Whisper API call.

Loading State Pattern

The chat UI uses a two-phase loading indicator:
  1. Phase A — "Thinking..." block: A separate loading block appears below the user message while waiting for the server to begin streaming. This is NOT an assistant message — it is removed when streaming starts.
  2. Phase B — Streaming content: Once the first chunk arrives, an assistant message is created with isStreaming: true. The loading indicator moves inside the message bubble until streaming completes.

Streaming Request Flow

Every streaming chat message follows this path from the client to the provider and back:
Client (useAIChat hook)
    |--- POST /api/ai/stream
    |    Body: { messages: [...], stream: true }
    |
    v
API Route (stream/route.ts)
    |--- 1. guardLLMChat()         → 404 if disabled
    |--- 2. getAuthUserId()        → 401 if unauthenticated
    |--- 3. ensureUserExists()     → Clerk ID → DB user ID
    |--- 4. checkRateLimit()       → 429/402 if exceeded
    |--- 5. checkUsageQuota()      → 402 if monthly quota exceeded
    |--- 6. deductCredits()        → 402 if insufficient
    |--- 7. Zod validation         → 400 if invalid
    |
    v
AI Service
    |--- resolveModelAlias()
    |--- createAIProvider()
    |--- streamResponse()
    |
    v
Provider API (OpenAI/Anthropic/Google/xAI)
    |
    v
SSE Stream (text/event-stream)
    |--- data: {"choices":[{"delta":{"content":"Hello"}}]}
    |--- data: {"choices":[{"delta":{"content":" world"}}]}
    |--- data: [DONE]
    |
    v
Client parses chunks → updates message state → renders in UI

API Routes

POST /api/ai/stream

The primary endpoint for streaming chat. Returns an SSE stream with real-time token delivery.
Request:
json
{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant" },
    { "role": "user", "content": "Explain React hooks" }
  ],
  "model": "claude",
  "temperature": 0.7,
  "maxTokens": 1000,
  "systemPrompt": "Optional system prompt",
  "context": "Optional context string"
}
Response: SSE stream with Content-Type: text/event-stream
src/app/api/ai/stream/route.ts — Request Schema
const StreamRequestSchema = z.object({
  messages: z.array(
    z.object({
      role: z.enum(['system', 'user', 'assistant', 'function', 'tool']),
      content: z.union([z.string(), z.array(ContentPartSchema)]),
      name: z.string().optional(),
    })
  ),
  model: z.string().optional(),
  temperature: z.number().min(0).max(2).optional(),
  maxTokens: z.number().positive().optional(),
  systemPrompt: z.string().optional(),
  context: z.string().optional(),
})

POST /api/ai/chat

Synchronous endpoint that returns a complete JSON response (no streaming). Supports single and batch requests.
Single Request:
json
{
  "messages": [{ "role": "user", "content": "What is Next.js?" }],
  "model": "gpt-5-mini",
  "temperature": 0.5
}
Batch Request:
json
{
  "requests": [
    { "messages": [{ "role": "user", "content": "Question 1" }] },
    { "messages": [{ "role": "user", "content": "Question 2" }] }
  ],
  "parallel": true
}
Batch requests are limited to 10 items. Set parallel: true for concurrent processing or false for sequential.

React Hooks

useAIChat

The primary hook for streaming chat with full message history management. Uses a custom SSE parser that handles five response formats across all providers:
src/hooks/use-ai.ts — useAIChat Hook
export function useAIChat(options: UseAIChatOptions = {}) {
  const {
    api = '/api/ai/stream',
    initialMessages = [],
    onFinish,
    onError,
  } = options
Usage Example:
tsx
'use client'

import { useAIChat } from '@/hooks/use-ai'

export function ChatPage() {
  const {
    messages,
    input,
    handleInputChange,
    handleSubmit,
    isLoading,
    error,
    stop,
    reload,
    clearMessages,
  } = useAIChat({
    api: '/api/ai/stream',
    onFinish: (message) => console.log('Done:', message.content),
    onError: (error) => console.error('Error:', error),
  })

  return (
    <form onSubmit={handleSubmit}>
      {messages.map((msg) => (
        <div key={msg.id}>{msg.content}</div>
      ))}
      <input value={input} onChange={handleInputChange} />
      <button type="submit" disabled={isLoading}>Send</button>
      {isLoading && <button onClick={stop}>Stop</button>}
    </form>
  )
}
Return values:
PropertyTypeDescription
messagesAIChatMessage[]Full conversation history
inputstringCurrent input field value
handleInputChange(e) => voidInput change handler
handleSubmit(e?) => voidForm submit handler
append(msg) => PromiseManually append a message
isLoadingbooleanTrue while streaming
errorError | nullLast error, if any
stop() => voidCancel current stream
reload() => voidRetry last message
setMessages(msgs) => voidReplace message history
clearMessages() => voidClear all messages
bonusHintstring | nullHint message when bonus credits were used
clearBonusHint() => voidClear the bonus hint message

Additional Hooks

Kit provides four more hooks for different use cases:
HookPurposeAPI Endpoint
useAICompletion()Non-streaming completions via Vercel AI SDK/api/ai/chat
useAIQuery()Cached AI queries with TanStack Query/api/ai/chat
useAIMutation()One-off AI requests via useMutation/api/ai/chat
useAIStream()Low-level streaming with manual control/api/ai/stream

Chat Components

The chat UI is built from composable components in apps/boilerplate/src/components/ai/:
ComponentFilePurpose
ChatContainerchat-container.tsxMain chat layout with header, messages, and input
ChatMessagechat-message.tsxSingle message bubble (user/assistant)
ChatInputchat-input.tsxText input with send button and keyboard shortcuts
ChatHeaderchat-header.tsxChat title, model info, clear button
QuickPromptsquick-prompts.tsxCategory-based suggestion buttons
SourceAttributionsource-attribution.tsxRAG source references with similarity scores
ChatSkeletonchat-skeleton.tsxLoading skeleton for chat messages
StreamingIndicatorstreaming-indicator.tsxAnimated typing indicator during streaming
ImagePreviewimage-preview.tsxThumbnail grid above input with remove buttons (Vision Chat)
ImageLightboximage-lightbox.tsxFull-screen image viewer with keyboard navigation (Vision Chat)
PdfAttachmentpdf-attachment.tsxPDF file preview chip with name, size, and remove button (PDF Chat)
AudioRecorderaudio-recorder.tsxVoice recording with audio level visualization and STT transcription (Audio Input)

Quick Prompts

Both chat modes have configurable suggestion buttons organized by category. Each category has an icon and a set of prompts:
src/lib/ai/quick-prompts.ts — Types and Configuration
/**
 * Single suggestion within a category
 */
export interface QuickPromptSuggestion {
  /** Unique identifier */
  id: string
  /** Short display label (max ~50 chars for UI) */
  label: string
  /** Full prompt text to be sent */
  prompt: string
}

/**
 * Category grouping related suggestions
 */
export interface QuickPromptCategory {
  /** Unique identifier */
  id: string
  /** Button label */
  label: string
  /** Lucide icon component */
  icon: LucideIcon
  /** List of suggestions in this category */
  suggestions: QuickPromptSuggestion[]
}

/**
 * Complete configuration for a chat type
 */
export interface QuickPromptConfig {
  chatType: 'llm' | 'rag'
  categories: QuickPromptCategory[]
LLM Chat categories: Code, Write, Debug, Learn, Ideas (25 prompts total)
RAG Chat categories: Setup, Auth, Payments, Features, Customize (25 prompts total)

Streaming Protocol

Kit uses Server-Sent Events (SSE) for streaming. The shared SSE parser in src/lib/ai/sse-parser.ts handles five response formats to support all providers. The parser provides two key classes:
  • SSEStreamError — Distinguishes server-sent errors (e.g., { "error": "Insufficient credits" }) from JSON parse errors. Server errors are re-thrown to the user; malformed JSON chunks are safely ignored.
  • SSELineBuffer — Accumulates partial lines across TCP packet boundaries, ensuring complete SSE lines are processed even when data arrives in fragments.
When the stream completes with zero content chunks (empty response), the client hooks remove the placeholder "thinking" message and display a user-friendly error. Diagnostic data (finishReason, usage, warnings) is logged for debugging.
The five supported response formats:
Format 1: OpenAI-style delta
  data: {"choices":[{"delta":{"content":"token"}}]}

Format 2: OpenAI-style text
  data: {"choices":[{"text":"token"}]}

Format 3: Direct content
  data: {"content":"token"}

Format 4: Direct text
  data: {"text":"token"}

Format 5: Anthropic-style delta
  data: {"delta":{"text":"token"}}

Termination:
  data: [DONE]
The stream response includes standard headers:
Content-Type: text/event-stream
Cache-Control: no-cache, no-transform
Connection: keep-alive
X-Accel-Buffering: no
The X-Accel-Buffering: no header disables nginx proxy buffering, which is critical for real-time streaming on reverse proxy setups (including Vercel).

Feature Guards

Route guards ensure disabled features return proper 404 responses instead of crashing. There are two types — API guards (return NextResponse) and page guards (return booleans for notFound()):
src/lib/ai/route-guards.ts — API Route Guards
export function guardRAGChat(): NextResponse<FeatureDisabledError> | null {
  if (!isRAGChatEnabled()) {
    return createFeatureDisabledResponse('RAG Chat')
  }
  return null
}

/**
 * Guard for LLM Chat API routes
 *
 * Use at the start of:
 * - /api/ai/chat
 * - /api/ai/stream
 *
 * @returns NextResponse if feature disabled, null if enabled
 *
 * @example
 * export async function POST(request: Request) {
 *   const guard = guardLLMChat()
 *   if (guard) return guard
 *   // Feature is enabled, continue with handler
 * }
 */
export function guardLLMChat(): NextResponse<FeatureDisabledError> | null {
  if (!isLLMChatEnabled()) {
    return createFeatureDisabledResponse('LLM Chat')
  }
  return null
}
API guards (for API routes):
FunctionProtectsReturns
guardRAGChat()/api/ai/rag/* routesNextResponse (404) or null
guardLLMChat()/api/ai/stream, /api/ai/chatNextResponse (404) or null
guardAnyChat()/api/ai/usageNextResponse (404) or null
guardAudioInput()/api/ai/speech-to-textNextResponse (404) or null
guardImageGen()/api/ai/image-genNextResponse (404) or null
Page guards (for Next.js pages):
FunctionProtectsUsage
shouldShowRAGChat()RAG Chat pageif (!shouldShowRAGChat()) notFound()
shouldShowLLMChat()LLM Chat pageif (!shouldShowLLMChat()) notFound()
shouldShowImageGen()Image Gen pageif (!shouldShowImageGen()) notFound()

Error Handling

The chat system handles errors at multiple levels:
LevelErrorResponse
Feature disabledGuard returns 404{ error: "Feature not available", code: "FEATURE_DISABLED" }
Not authenticatedClerk check fails{ error: "Unauthorized" } (401)
Rate limitedGlobal burst exceeded{ error: "Too many requests" } (429)
Insufficient creditsCredit balance too low{ error: "Insufficient credits" } (402)
Invalid requestZod validation fails{ error: "Validation error", details: [...] } (400)
Provider errorAPI call fails{ error: "...", provider: "openai", retryable: true } (5xx)
Stream errorMid-stream failureError sent as SSE event, stream closes