Chat System

Kit provides a complete chat system with two modes — LLM Chat for direct AI conversations and RAG Chat for knowledge-base-powered responses. Both use streaming SSE (Server-Sent Events) for real-time token delivery, share a common component library, and integrate with the credit system.

This page covers the streaming protocol, API routes, React hooks, and chat components. For the RAG-specific pipeline, see RAG System.

Two Chat Modes

Aspect	LLM Chat	RAG Chat
Dashboard Route	`/dashboard/chat-llm`	`/dashboard/chat-rag`
API Route	`/api/ai/stream` (SSE), `/api/ai/chat` (JSON)	`/api/ai/rag/ask`
Hook	`useAIChat()`	Custom RAG hook
Context Source	Conversation history only	pgvector search + conversation
Feature Flag	`NEXT_PUBLIC_AI_LLM_CHAT_ENABLED`	`NEXT_PUBLIC_AI_RAG_CHAT_ENABLED`
Credit Cost	20 credits (streaming), 30 (image analysis), 40 (PDF analysis), 20 (voice input), 15 (sync)	15 credits
Auth Required	Yes (Clerk)	Yes (Clerk)

Vision Chat (Image Analysis)

When NEXT_PUBLIC_AI_VISION_ENABLED=true (default) and LLM Chat is enabled, users can attach images to messages for AI analysis. Vision Chat adds image upload capabilities to the existing LLM Chat interface.

Upload Methods: Drag & drop onto chat area, paste from clipboard (Ctrl+V), or file picker button.

Constraints:

Constraint	Value
Max image size	4.5 MB per image
Max images per message	4
Accepted types	PNG, JPEG, WebP, GIF

Images are encoded as Base64 data URIs and sent as ContentPart[] in the message content field:

json

{
  "messages": [{
    "role": "user",
    "content": [
      { "type": "image", "image": "data:image/png;base64,..." },
      { "type": "text", "text": "Describe this image" }
    ]
  }]
}

The stream route auto-detects image content and selects the image_analysis credit operation (30 credits) instead of chat_streaming (20 credits). The core Message.content type stays string for backward compatibility — multimodal ContentPart[] is handled at the API boundary only.

Vision Chat requires LLM Chat to be enabled. Both NEXT_PUBLIC_AI_LLM_CHAT_ENABLED and NEXT_PUBLIC_AI_VISION_ENABLED must be true for image upload features to appear. Set NEXT_PUBLIC_AI_VISION_ENABLED=false to hide image upload while keeping LLM Chat active.

PDF Chat (Document Analysis)

When NEXT_PUBLIC_AI_PDF_CHAT_ENABLED=true (default) and LLM Chat is enabled, users can attach PDF documents to messages for AI analysis. PDF Chat uses server-side text extraction with pdf-parse — no vision API required, so it works with all providers (OpenAI, Anthropic, Google, xAI).

Upload Methods: Drag & drop onto chat area or file picker button (paperclip icon).

Constraints:

Constraint	Value
Max file size	10 MB per PDF
Max PDFs per message	1
Accepted types	PDF only (`.pdf`)
Max extracted text	50,000 characters

PDFs are read as ArrayBuffer, sent to the server as Base64, and extracted server-side. The extracted text is prepended to the user's message as context:

json

{
  "messages": [{
    "role": "user",
    "content": "--- PDF Document: report.pdf ---\n[extracted text]\n--- End PDF ---\n\nSummarize this document"
  }],
  "pdfAttached": true
}

The stream route auto-detects pdfAttached: true and selects the pdf_analysis credit operation (40 credits) instead of chat_streaming (20 credits).

PDF Chat requires LLM Chat to be enabled. Both NEXT_PUBLIC_AI_LLM_CHAT_ENABLED and NEXT_PUBLIC_AI_PDF_CHAT_ENABLED must be true for PDF upload features to appear. Set NEXT_PUBLIC_AI_PDF_CHAT_ENABLED=false to hide PDF upload while keeping LLM Chat active.

Audio Input (Speech-to-Text)

When NEXT_PUBLIC_AI_AUDIO_INPUT_ENABLED=true (default) and LLM Chat is enabled, a microphone button appears in the chat input area. Users can record voice messages which are transcribed via the OpenAI Whisper API and inserted into the chat input field.

Recording Flow: Press microphone button → grant permission → record (with live audio level visualization and timer) → press stop → audio is transcribed → text appears in input field.

Constraints:

Constraint	Value
Max recording duration	120 seconds
Audio format	WebM (preferred), WAV (fallback)
Max file size	25 MB
Transcription model	Whisper (`whisper-1`)

The recorded audio is sent as multipart/form-data to /api/ai/speech-to-text, which forwards it to the OpenAI Whisper API. The transcribed text is returned with language detection and duration metadata.

Credit cost: 20 credits per transcription (speech_to_text operation). Credits are deducted before the Whisper API call.

Audio Input requires LLM Chat to be enabled. Both NEXT_PUBLIC_AI_LLM_CHAT_ENABLED and NEXT_PUBLIC_AI_AUDIO_INPUT_ENABLED must be true for the microphone button to appear. Set NEXT_PUBLIC_AI_AUDIO_INPUT_ENABLED=false to hide voice input while keeping LLM Chat active.

Loading State Pattern

The chat UI uses a two-phase loading indicator:

Phase A — "Thinking..." block: A separate loading block appears below the user message while waiting for the server to begin streaming. This is NOT an assistant message — it is removed when streaming starts.
Phase B — Streaming content: Once the first chunk arrives, an assistant message is created with isStreaming: true. The loading indicator moves inside the message bubble until streaming completes.

Streaming Request Flow

Every streaming chat message follows this path from the client to the provider and back:

Client (useAIChat hook)
    |--- POST /api/ai/stream
    |    Body: { messages: [...], stream: true }
    |
    v
API Route (stream/route.ts)
    |--- 1. guardLLMChat()         → 404 if disabled
    |--- 2. getAuthUserId()        → 401 if unauthenticated
    |--- 3. ensureUserExists()     → Clerk ID → DB user ID
    |--- 4. checkRateLimit()       → 429/402 if exceeded
    |--- 5. checkUsageQuota()      → 402 if monthly quota exceeded
    |--- 6. deductCredits()        → 402 if insufficient
    |--- 7. Zod validation         → 400 if invalid
    |
    v
AI Service
    |--- resolveModelAlias()
    |--- createAIProvider()
    |--- streamResponse()
    |
    v
Provider API (OpenAI/Anthropic/Google/xAI)
    |
    v
SSE Stream (text/event-stream)
    |--- data: {"choices":[{"delta":{"content":"Hello"}}]}
    |--- data: {"choices":[{"delta":{"content":" world"}}]}
    |--- data: [DONE]
    |
    v
Client parses chunks → updates message state → renders in UI

API Routes

POST /api/ai/stream

The primary endpoint for streaming chat. Returns an SSE stream with real-time token delivery.

Request:

json

{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant" },
    { "role": "user", "content": "Explain React hooks" }
  ],
  "model": "claude",
  "temperature": 0.7,
  "maxTokens": 1000,
  "systemPrompt": "Optional system prompt",
  "context": "Optional context string"
}

Response: SSE stream with Content-Type: text/event-stream

src/app/api/ai/stream/route.ts — Request Schema

const StreamRequestSchema = z.object({
  messages: z.array(
    z.object({
      role: z.enum(['system', 'user', 'assistant', 'function', 'tool']),
      content: z.union([z.string(), z.array(ContentPartSchema)]),
      name: z.string().optional(),
    })
  ),
  model: z.string().optional(),
  temperature: z.number().min(0).max(2).optional(),
  maxTokens: z.number().positive().optional(),
  systemPrompt: z.string().optional(),
  context: z.string().optional(),
})

Both /api/ai/stream and /api/ai/chat require Clerk authentication. The route converts the Clerk ID to a database user ID via ensureUserExists() — this is required for the credit system to identify the correct user.

POST /api/ai/chat

Synchronous endpoint that returns a complete JSON response (no streaming). Supports single and batch requests.

Single Request:

json

{
  "messages": [{ "role": "user", "content": "What is Next.js?" }],
  "model": "gpt-5-mini",
  "temperature": 0.5
}

Batch Request:

json

{
  "requests": [
    { "messages": [{ "role": "user", "content": "Question 1" }] },
    { "messages": [{ "role": "user", "content": "Question 2" }] }
  ],
  "parallel": true
}

Batch requests are limited to 10 items. Set parallel: true for concurrent processing or false for sequential.

React Hooks

useAIChat

The Vercel AI SDK's useChat hook uses a proprietary Data Stream Protocol that is incompatible with Kit's standard SSE streaming. If you import useChat from ai/react, streaming will silently fail — no error, no tokens, just an empty response.

typescript

// WRONG — uses Vercel's proprietary protocol, incompatible with Kit's SSE
import { useChat } from 'ai/react'
const { messages } = useChat({ api: '/api/ai/stream' }) // Silent failure!

// CORRECT — Kit's custom hook with multi-provider SSE parser
import { useAIChat } from '@/hooks/use-ai'
const { messages } = useAIChat({ api: '/api/ai/stream' }) // Works!

This is the most common mistake for developers coming from other Vercel AI SDK projects. Kit uses standard OpenAI-compatible SSE because it needs to support multiple providers (OpenAI, Anthropic, Google, xAI) — Vercel's Data Stream Protocol only works with their specific backend format.

The primary hook for streaming chat with full message history management. Uses a custom SSE parser that handles five response formats across all providers:

src/hooks/use-ai.ts — useAIChat Hook

export function useAIChat(options: UseAIChatOptions = {}) {
  const {
    api = '/api/ai/stream',
    initialMessages = [],
    onFinish,
    onError,
  } = options

Usage Example:

tsx

'use client'

import { useAIChat } from '@/hooks/use-ai'

export function ChatPage() {
  const {
    messages,
    input,
    handleInputChange,
    handleSubmit,
    isLoading,
    error,
    stop,
    reload,
    clearMessages,
  } = useAIChat({
    api: '/api/ai/stream',
    onFinish: (message) => console.log('Done:', message.content),
    onError: (error) => console.error('Error:', error),
  })

  return (
    <form onSubmit={handleSubmit}>
      {messages.map((msg) => (
        <div key={msg.id}>{msg.content}</div>
      ))}
      <input value={input} onChange={handleInputChange} />
      <button type="submit" disabled={isLoading}>Send</button>
      {isLoading && <button onClick={stop}>Stop</button>}
    </form>
  )
}

Return values:

Property	Type	Description
`messages`	`AIChatMessage[]`	Full conversation history
`input`	`string`	Current input field value
`handleInputChange`	`(e) => void`	Input change handler
`handleSubmit`	`(e?) => void`	Form submit handler
`append`	`(msg) => Promise`	Manually append a message
`isLoading`	`boolean`	True while streaming
`error`	`Error \| null`	Last error, if any
`stop`	`() => void`	Cancel current stream
`reload`	`() => void`	Retry last message
`setMessages`	`(msgs) => void`	Replace message history
`clearMessages`	`() => void`	Clear all messages
`bonusHint`	`string \| null`	Hint message when bonus credits were used
`clearBonusHint`	`() => void`	Clear the bonus hint message

Additional Hooks

Kit provides four more hooks for different use cases:

Hook	Purpose	API Endpoint
`useAICompletion()`	Non-streaming completions via Vercel AI SDK	`/api/ai/chat`
`useAIQuery()`	Cached AI queries with TanStack Query	`/api/ai/chat`
`useAIMutation()`	One-off AI requests via `useMutation`	`/api/ai/chat`
`useAIStream()`	Low-level streaming with manual control	`/api/ai/stream`

Chat Components

The chat UI is built from composable components in apps/boilerplate/src/components/ai/:

Component	File	Purpose
`ChatContainer`	`chat-container.tsx`	Main chat layout with header, messages, and input
`ChatMessage`	`chat-message.tsx`	Single message bubble (user/assistant)
`ChatInput`	`chat-input.tsx`	Text input with send button and keyboard shortcuts
`ChatHeader`	`chat-header.tsx`	Chat title, model info, clear button
`QuickPrompts`	`quick-prompts.tsx`	Category-based suggestion buttons
`SourceAttribution`	`source-attribution.tsx`	RAG source references with similarity scores
`ChatSkeleton`	`chat-skeleton.tsx`	Loading skeleton for chat messages
`StreamingIndicator`	`streaming-indicator.tsx`	Animated typing indicator during streaming
`ImagePreview`	`image-preview.tsx`	Thumbnail grid above input with remove buttons (Vision Chat)
`ImageLightbox`	`image-lightbox.tsx`	Full-screen image viewer with keyboard navigation (Vision Chat)
`PdfAttachment`	`pdf-attachment.tsx`	PDF file preview chip with name, size, and remove button (PDF Chat)
`AudioRecorder`	`audio-recorder.tsx`	Voice recording with audio level visualization and STT transcription (Audio Input)

Quick Prompts

Both chat modes have configurable suggestion buttons organized by category. Each category has an icon and a set of prompts:

src/lib/ai/quick-prompts.ts — Types and Configuration

/**
 * Single suggestion within a category
 */
export interface QuickPromptSuggestion {
  /** Unique identifier */
  id: string
  /** Short display label (max ~50 chars for UI) */
  label: string
  /** Full prompt text to be sent */
  prompt: string
}

/**
 * Category grouping related suggestions
 */
export interface QuickPromptCategory {
  /** Unique identifier */
  id: string
  /** Button label */
  label: string
  /** Lucide icon component */
  icon: LucideIcon
  /** List of suggestions in this category */
  suggestions: QuickPromptSuggestion[]
}

/**
 * Complete configuration for a chat type
 */
export interface QuickPromptConfig {
  chatType: 'llm' | 'rag'
  categories: QuickPromptCategory[]

LLM Chat categories: Code, Write, Debug, Learn, Ideas (25 prompts total)

RAG Chat categories: Setup, Auth, Payments, Features, Customize (25 prompts total)

Edit apps/boilerplate/src/lib/ai/quick-prompts.ts to customize the suggestion buttons for your application. Each category needs an id, label, icon (Lucide icon component), and an array of suggestions with id, label, and prompt fields.

Streaming Protocol

Kit uses Server-Sent Events (SSE) for streaming. The shared SSE parser in src/lib/ai/sse-parser.ts handles five response formats to support all providers. The parser provides two key classes:

SSEStreamError — Distinguishes server-sent errors (e.g., { "error": "Insufficient credits" }) from JSON parse errors. Server errors are re-thrown to the user; malformed JSON chunks are safely ignored.
SSELineBuffer — Accumulates partial lines across TCP packet boundaries, ensuring complete SSE lines are processed even when data arrives in fragments.

When the stream completes with zero content chunks (empty response), the client hooks remove the placeholder "thinking" message and display a user-friendly error. Diagnostic data (finishReason, usage, warnings) is logged for debugging.

The five supported response formats:

Format 1: OpenAI-style delta
  data: {"choices":[{"delta":{"content":"token"}}]}

Format 2: OpenAI-style text
  data: {"choices":[{"text":"token"}]}

Format 3: Direct content
  data: {"content":"token"}

Format 4: Direct text
  data: {"text":"token"}

Format 5: Anthropic-style delta
  data: {"delta":{"text":"token"}}

Termination:
  data: [DONE]

The stream response includes standard headers:

Content-Type: text/event-stream
Cache-Control: no-cache, no-transform
Connection: keep-alive
X-Accel-Buffering: no

The X-Accel-Buffering: no header disables nginx proxy buffering, which is critical for real-time streaming on reverse proxy setups (including Vercel).

Credits are deducted before the streaming response begins — not after. This is intentional: it prevents users from receiving a full AI response and then having the credit deduction fail. If the stream fails mid-response, the credit is not refunded automatically.

Feature Guards

Route guards ensure disabled features return proper 404 responses instead of crashing. There are two types — API guards (return NextResponse) and page guards (return booleans for notFound()):

src/lib/ai/route-guards.ts — API Route Guards

export function guardRAGChat(): NextResponse<FeatureDisabledError> | null {
  if (!isRAGChatEnabled()) {
    return createFeatureDisabledResponse('RAG Chat')
  }
  return null
}

/**
 * Guard for LLM Chat API routes
 *
 * Use at the start of:
 * - /api/ai/chat
 * - /api/ai/stream
 *
 * @returns NextResponse if feature disabled, null if enabled
 *
 * @example
 * export async function POST(request: Request) {
 *   const guard = guardLLMChat()
 *   if (guard) return guard
 *   // Feature is enabled, continue with handler
 * }
 */
export function guardLLMChat(): NextResponse<FeatureDisabledError> | null {
  if (!isLLMChatEnabled()) {
    return createFeatureDisabledResponse('LLM Chat')
  }
  return null
}

API guards (for API routes):

Function	Protects	Returns
`guardRAGChat()`	`/api/ai/rag/*` routes	`NextResponse` (404) or `null`
`guardLLMChat()`	`/api/ai/stream`, `/api/ai/chat`	`NextResponse` (404) or `null`
`guardAnyChat()`	`/api/ai/usage`	`NextResponse` (404) or `null`
`guardAudioInput()`	`/api/ai/speech-to-text`	`NextResponse` (404) or `null`
`guardImageGen()`	`/api/ai/image-gen`	`NextResponse` (404) or `null`

Page guards (for Next.js pages):

Function	Protects	Usage
`shouldShowRAGChat()`	RAG Chat page	`if (!shouldShowRAGChat()) notFound()`
`shouldShowLLMChat()`	LLM Chat page	`if (!shouldShowLLMChat()) notFound()`
`shouldShowImageGen()`	Image Gen page	`if (!shouldShowImageGen()) notFound()`

Error Handling

The chat system handles errors at multiple levels:

Level	Error	Response
Feature disabled	Guard returns 404	`{ error: "Feature not available", code: "FEATURE_DISABLED" }`
Not authenticated	Clerk check fails	`{ error: "Unauthorized" }` (401)
Rate limited	Global burst exceeded	`{ error: "Too many requests" }` (429)
Insufficient credits	Credit balance too low	`{ error: "Insufficient credits" }` (402)
Invalid request	Zod validation fails	`{ error: "Validation error", details: [...] }` (400)
Provider error	API call fails	`{ error: "...", provider: "openai", retryable: true }` (5xx)
Stream error	Mid-stream failure	Error sent as SSE event, stream closes

Two Chat Modes

Vision Chat (Image Analysis)

PDF Chat (Document Analysis)

Audio Input (Speech-to-Text)

Loading State Pattern

Streaming Request Flow

API Routes

POST /api/ai/stream

POST /api/ai/chat

React Hooks

useAIChat

Additional Hooks

Chat Components

Quick Prompts

Streaming Protocol

Feature Guards

Error Handling

Related