Kit provides a complete chat system with two modes — LLM Chat for direct AI conversations and RAG Chat for knowledge-base-powered responses. Both use streaming SSE (Server-Sent Events) for real-time token delivery, share a common component library, and integrate with the credit system.
This page covers the streaming protocol, API routes, React hooks, and chat components. For the RAG-specific pipeline, see RAG System.
Two Chat Modes
| Aspect | LLM Chat | RAG Chat |
|---|---|---|
| Dashboard Route | /dashboard/chat-llm | /dashboard/chat-rag |
| API Route | /api/ai/stream (SSE), /api/ai/chat (JSON) | /api/ai/rag/ask |
| Hook | useAIChat() | Custom RAG hook |
| Context Source | Conversation history only | pgvector search + conversation |
| Feature Flag | NEXT_PUBLIC_AI_LLM_CHAT_ENABLED | NEXT_PUBLIC_AI_RAG_CHAT_ENABLED |
| Credit Cost | 20 credits (streaming), 30 (image analysis), 40 (PDF analysis), 20 (voice input), 15 (sync) | 15 credits |
| Auth Required | Yes (Clerk) | Yes (Clerk) |
Vision Chat (Image Analysis)
When
NEXT_PUBLIC_AI_VISION_ENABLED=true (default) and LLM Chat is enabled, users can attach images to messages for AI analysis. Vision Chat adds image upload capabilities to the existing LLM Chat interface.Upload Methods: Drag & drop onto chat area, paste from clipboard (Ctrl+V), or file picker button.
Constraints:
| Constraint | Value |
|---|---|
| Max image size | 4.5 MB per image |
| Max images per message | 4 |
| Accepted types | PNG, JPEG, WebP, GIF |
Images are encoded as Base64 data URIs and sent as
ContentPart[] in the message content field:json
{
"messages": [{
"role": "user",
"content": [
{ "type": "image", "image": "data:image/png;base64,..." },
{ "type": "text", "text": "Describe this image" }
]
}]
}
The stream route auto-detects image content and selects the
image_analysis credit operation (30 credits) instead of chat_streaming (20 credits). The core Message.content type stays string for backward compatibility — multimodal ContentPart[] is handled at the API boundary only.Vision Chat requires LLM Chat to be enabled. Both
NEXT_PUBLIC_AI_LLM_CHAT_ENABLED and NEXT_PUBLIC_AI_VISION_ENABLED must be true for image upload features to appear. Set NEXT_PUBLIC_AI_VISION_ENABLED=false to hide image upload while keeping LLM Chat active.PDF Chat (Document Analysis)
When
NEXT_PUBLIC_AI_PDF_CHAT_ENABLED=true (default) and LLM Chat is enabled, users can attach PDF documents to messages for AI analysis. PDF Chat uses server-side text extraction with pdf-parse — no vision API required, so it works with all providers (OpenAI, Anthropic, Google, xAI).Upload Methods: Drag & drop onto chat area or file picker button (paperclip icon).
Constraints:
| Constraint | Value |
|---|---|
| Max file size | 10 MB per PDF |
| Max PDFs per message | 1 |
| Accepted types | PDF only (.pdf) |
| Max extracted text | 50,000 characters |
PDFs are read as
ArrayBuffer, sent to the server as Base64, and extracted server-side. The extracted text is prepended to the user's message as context:json
{
"messages": [{
"role": "user",
"content": "--- PDF Document: report.pdf ---\n[extracted text]\n--- End PDF ---\n\nSummarize this document"
}],
"pdfAttached": true
}
The stream route auto-detects
pdfAttached: true and selects the pdf_analysis credit operation (40 credits) instead of chat_streaming (20 credits).PDF Chat requires LLM Chat to be enabled. Both
NEXT_PUBLIC_AI_LLM_CHAT_ENABLED and NEXT_PUBLIC_AI_PDF_CHAT_ENABLED must be true for PDF upload features to appear. Set NEXT_PUBLIC_AI_PDF_CHAT_ENABLED=false to hide PDF upload while keeping LLM Chat active.Audio Input (Speech-to-Text)
When
NEXT_PUBLIC_AI_AUDIO_INPUT_ENABLED=true (default) and LLM Chat is enabled, a microphone button appears in the chat input area. Users can record voice messages which are transcribed via the OpenAI Whisper API and inserted into the chat input field.Recording Flow: Press microphone button → grant permission → record (with live audio level visualization and timer) → press stop → audio is transcribed → text appears in input field.
Constraints:
| Constraint | Value |
|---|---|
| Max recording duration | 120 seconds |
| Audio format | WebM (preferred), WAV (fallback) |
| Max file size | 25 MB |
| Transcription model | Whisper (whisper-1) |
The recorded audio is sent as
multipart/form-data to /api/ai/speech-to-text, which forwards it to the OpenAI Whisper API. The transcribed text is returned with language detection and duration metadata.Credit cost: 20 credits per transcription (
speech_to_text operation). Credits are deducted before the Whisper API call.Audio Input requires LLM Chat to be enabled. Both
NEXT_PUBLIC_AI_LLM_CHAT_ENABLED and NEXT_PUBLIC_AI_AUDIO_INPUT_ENABLED must be true for the microphone button to appear. Set NEXT_PUBLIC_AI_AUDIO_INPUT_ENABLED=false to hide voice input while keeping LLM Chat active.Loading State Pattern
The chat UI uses a two-phase loading indicator:
- Phase A — "Thinking..." block: A separate loading block appears below the user message while waiting for the server to begin streaming. This is NOT an assistant message — it is removed when streaming starts.
- Phase B — Streaming content: Once the first chunk arrives, an assistant message is created with
isStreaming: true. The loading indicator moves inside the message bubble until streaming completes.
Streaming Request Flow
Every streaming chat message follows this path from the client to the provider and back:
Client (useAIChat hook)
|--- POST /api/ai/stream
| Body: { messages: [...], stream: true }
|
v
API Route (stream/route.ts)
|--- 1. guardLLMChat() → 404 if disabled
|--- 2. getAuthUserId() → 401 if unauthenticated
|--- 3. ensureUserExists() → Clerk ID → DB user ID
|--- 4. checkRateLimit() → 429/402 if exceeded
|--- 5. checkUsageQuota() → 402 if monthly quota exceeded
|--- 6. deductCredits() → 402 if insufficient
|--- 7. Zod validation → 400 if invalid
|
v
AI Service
|--- resolveModelAlias()
|--- createAIProvider()
|--- streamResponse()
|
v
Provider API (OpenAI/Anthropic/Google/xAI)
|
v
SSE Stream (text/event-stream)
|--- data: {"choices":[{"delta":{"content":"Hello"}}]}
|--- data: {"choices":[{"delta":{"content":" world"}}]}
|--- data: [DONE]
|
v
Client parses chunks → updates message state → renders in UI
API Routes
POST /api/ai/stream
The primary endpoint for streaming chat. Returns an SSE stream with real-time token delivery.
Request:
json
{
"messages": [
{ "role": "system", "content": "You are a helpful assistant" },
{ "role": "user", "content": "Explain React hooks" }
],
"model": "claude",
"temperature": 0.7,
"maxTokens": 1000,
"systemPrompt": "Optional system prompt",
"context": "Optional context string"
}
Response: SSE stream with
Content-Type: text/event-streamsrc/app/api/ai/stream/route.ts — Request Schema
const StreamRequestSchema = z.object({
messages: z.array(
z.object({
role: z.enum(['system', 'user', 'assistant', 'function', 'tool']),
content: z.union([z.string(), z.array(ContentPartSchema)]),
name: z.string().optional(),
})
),
model: z.string().optional(),
temperature: z.number().min(0).max(2).optional(),
maxTokens: z.number().positive().optional(),
systemPrompt: z.string().optional(),
context: z.string().optional(),
})
Both
/api/ai/stream and /api/ai/chat require Clerk authentication. The route converts the Clerk ID to a database user ID via ensureUserExists() — this is required for the credit system to identify the correct user.POST /api/ai/chat
Synchronous endpoint that returns a complete JSON response (no streaming). Supports single and batch requests.
Single Request:
json
{
"messages": [{ "role": "user", "content": "What is Next.js?" }],
"model": "gpt-5-mini",
"temperature": 0.5
}
Batch Request:
json
{
"requests": [
{ "messages": [{ "role": "user", "content": "Question 1" }] },
{ "messages": [{ "role": "user", "content": "Question 2" }] }
],
"parallel": true
}
Batch requests are limited to 10 items. Set
parallel: true for concurrent processing or false for sequential.React Hooks
useAIChat
The Vercel AI SDK's
useChat hook uses a proprietary Data Stream Protocol that is incompatible with Kit's standard SSE streaming. If you import useChat from ai/react, streaming will silently fail — no error, no tokens, just an empty response.typescript
// WRONG — uses Vercel's proprietary protocol, incompatible with Kit's SSE
import { useChat } from 'ai/react'
const { messages } = useChat({ api: '/api/ai/stream' }) // Silent failure!
// CORRECT — Kit's custom hook with multi-provider SSE parser
import { useAIChat } from '@/hooks/use-ai'
const { messages } = useAIChat({ api: '/api/ai/stream' }) // Works!
This is the most common mistake for developers coming from other Vercel AI SDK projects. Kit uses standard OpenAI-compatible SSE because it needs to support multiple providers (OpenAI, Anthropic, Google, xAI) — Vercel's Data Stream Protocol only works with their specific backend format.
The primary hook for streaming chat with full message history management. Uses a custom SSE parser that handles five response formats across all providers:
src/hooks/use-ai.ts — useAIChat Hook
export function useAIChat(options: UseAIChatOptions = {}) {
const {
api = '/api/ai/stream',
initialMessages = [],
onFinish,
onError,
} = options
Usage Example:
tsx
'use client'
import { useAIChat } from '@/hooks/use-ai'
export function ChatPage() {
const {
messages,
input,
handleInputChange,
handleSubmit,
isLoading,
error,
stop,
reload,
clearMessages,
} = useAIChat({
api: '/api/ai/stream',
onFinish: (message) => console.log('Done:', message.content),
onError: (error) => console.error('Error:', error),
})
return (
<form onSubmit={handleSubmit}>
{messages.map((msg) => (
<div key={msg.id}>{msg.content}</div>
))}
<input value={input} onChange={handleInputChange} />
<button type="submit" disabled={isLoading}>Send</button>
{isLoading && <button onClick={stop}>Stop</button>}
</form>
)
}
Return values:
| Property | Type | Description |
|---|---|---|
messages | AIChatMessage[] | Full conversation history |
input | string | Current input field value |
handleInputChange | (e) => void | Input change handler |
handleSubmit | (e?) => void | Form submit handler |
append | (msg) => Promise | Manually append a message |
isLoading | boolean | True while streaming |
error | Error | null | Last error, if any |
stop | () => void | Cancel current stream |
reload | () => void | Retry last message |
setMessages | (msgs) => void | Replace message history |
clearMessages | () => void | Clear all messages |
bonusHint | string | null | Hint message when bonus credits were used |
clearBonusHint | () => void | Clear the bonus hint message |
Additional Hooks
Kit provides four more hooks for different use cases:
| Hook | Purpose | API Endpoint |
|---|---|---|
useAICompletion() | Non-streaming completions via Vercel AI SDK | /api/ai/chat |
useAIQuery() | Cached AI queries with TanStack Query | /api/ai/chat |
useAIMutation() | One-off AI requests via useMutation | /api/ai/chat |
useAIStream() | Low-level streaming with manual control | /api/ai/stream |
Chat Components
The chat UI is built from composable components in
apps/boilerplate/src/components/ai/:| Component | File | Purpose |
|---|---|---|
ChatContainer | chat-container.tsx | Main chat layout with header, messages, and input |
ChatMessage | chat-message.tsx | Single message bubble (user/assistant) |
ChatInput | chat-input.tsx | Text input with send button and keyboard shortcuts |
ChatHeader | chat-header.tsx | Chat title, model info, clear button |
QuickPrompts | quick-prompts.tsx | Category-based suggestion buttons |
SourceAttribution | source-attribution.tsx | RAG source references with similarity scores |
ChatSkeleton | chat-skeleton.tsx | Loading skeleton for chat messages |
StreamingIndicator | streaming-indicator.tsx | Animated typing indicator during streaming |
ImagePreview | image-preview.tsx | Thumbnail grid above input with remove buttons (Vision Chat) |
ImageLightbox | image-lightbox.tsx | Full-screen image viewer with keyboard navigation (Vision Chat) |
PdfAttachment | pdf-attachment.tsx | PDF file preview chip with name, size, and remove button (PDF Chat) |
AudioRecorder | audio-recorder.tsx | Voice recording with audio level visualization and STT transcription (Audio Input) |
Quick Prompts
Both chat modes have configurable suggestion buttons organized by category. Each category has an icon and a set of prompts:
src/lib/ai/quick-prompts.ts — Types and Configuration
/**
* Single suggestion within a category
*/
export interface QuickPromptSuggestion {
/** Unique identifier */
id: string
/** Short display label (max ~50 chars for UI) */
label: string
/** Full prompt text to be sent */
prompt: string
}
/**
* Category grouping related suggestions
*/
export interface QuickPromptCategory {
/** Unique identifier */
id: string
/** Button label */
label: string
/** Lucide icon component */
icon: LucideIcon
/** List of suggestions in this category */
suggestions: QuickPromptSuggestion[]
}
/**
* Complete configuration for a chat type
*/
export interface QuickPromptConfig {
chatType: 'llm' | 'rag'
categories: QuickPromptCategory[]
LLM Chat categories: Code, Write, Debug, Learn, Ideas (25 prompts total)
RAG Chat categories: Setup, Auth, Payments, Features, Customize (25 prompts total)
Edit
apps/boilerplate/src/lib/ai/quick-prompts.ts to customize the suggestion buttons for your application. Each category needs an id, label, icon (Lucide icon component), and an array of suggestions with id, label, and prompt fields.Streaming Protocol
Kit uses Server-Sent Events (SSE) for streaming. The shared SSE parser in
src/lib/ai/sse-parser.ts handles five response formats to support all providers. The parser provides two key classes:SSEStreamError— Distinguishes server-sent errors (e.g.,{ "error": "Insufficient credits" }) from JSON parse errors. Server errors are re-thrown to the user; malformed JSON chunks are safely ignored.SSELineBuffer— Accumulates partial lines across TCP packet boundaries, ensuring complete SSE lines are processed even when data arrives in fragments.
When the stream completes with zero content chunks (empty response), the client hooks remove the placeholder "thinking" message and display a user-friendly error. Diagnostic data (finishReason, usage, warnings) is logged for debugging.
The five supported response formats:
Format 1: OpenAI-style delta
data: {"choices":[{"delta":{"content":"token"}}]}
Format 2: OpenAI-style text
data: {"choices":[{"text":"token"}]}
Format 3: Direct content
data: {"content":"token"}
Format 4: Direct text
data: {"text":"token"}
Format 5: Anthropic-style delta
data: {"delta":{"text":"token"}}
Termination:
data: [DONE]
The stream response includes standard headers:
Content-Type: text/event-stream
Cache-Control: no-cache, no-transform
Connection: keep-alive
X-Accel-Buffering: no
The
X-Accel-Buffering: no header disables nginx proxy buffering, which is critical for real-time streaming on reverse proxy setups (including Vercel).Credits are deducted before the streaming response begins — not after. This is intentional: it prevents users from receiving a full AI response and then having the credit deduction fail. If the stream fails mid-response, the credit is not refunded automatically.
Feature Guards
Route guards ensure disabled features return proper 404 responses instead of crashing. There are two types — API guards (return
NextResponse) and page guards (return booleans for notFound()):src/lib/ai/route-guards.ts — API Route Guards
export function guardRAGChat(): NextResponse<FeatureDisabledError> | null {
if (!isRAGChatEnabled()) {
return createFeatureDisabledResponse('RAG Chat')
}
return null
}
/**
* Guard for LLM Chat API routes
*
* Use at the start of:
* - /api/ai/chat
* - /api/ai/stream
*
* @returns NextResponse if feature disabled, null if enabled
*
* @example
* export async function POST(request: Request) {
* const guard = guardLLMChat()
* if (guard) return guard
* // Feature is enabled, continue with handler
* }
*/
export function guardLLMChat(): NextResponse<FeatureDisabledError> | null {
if (!isLLMChatEnabled()) {
return createFeatureDisabledResponse('LLM Chat')
}
return null
}
API guards (for API routes):
| Function | Protects | Returns |
|---|---|---|
guardRAGChat() | /api/ai/rag/* routes | NextResponse (404) or null |
guardLLMChat() | /api/ai/stream, /api/ai/chat | NextResponse (404) or null |
guardAnyChat() | /api/ai/usage | NextResponse (404) or null |
guardAudioInput() | /api/ai/speech-to-text | NextResponse (404) or null |
guardImageGen() | /api/ai/image-gen | NextResponse (404) or null |
Page guards (for Next.js pages):
| Function | Protects | Usage |
|---|---|---|
shouldShowRAGChat() | RAG Chat page | if (!shouldShowRAGChat()) notFound() |
shouldShowLLMChat() | LLM Chat page | if (!shouldShowLLMChat()) notFound() |
shouldShowImageGen() | Image Gen page | if (!shouldShowImageGen()) notFound() |
Error Handling
The chat system handles errors at multiple levels:
| Level | Error | Response |
|---|---|---|
| Feature disabled | Guard returns 404 | { error: "Feature not available", code: "FEATURE_DISABLED" } |
| Not authenticated | Clerk check fails | { error: "Unauthorized" } (401) |
| Rate limited | Global burst exceeded | { error: "Too many requests" } (429) |
| Insufficient credits | Credit balance too low | { error: "Insufficient credits" } (402) |
| Invalid request | Zod validation fails | { error: "Validation error", details: [...] } (400) |
| Provider error | API call fails | { error: "...", provider: "openai", retryable: true } (5xx) |
| Stream error | Mid-stream failure | Error sent as SSE event, stream closes |