Skip to main content

Chat

Chat lets you ask questions about your documents using retrieval-augmented generation (RAG). The AI searches your indexed content, retrieves relevant passages, and generates answers grounded in your actual documents.

Conversations

Each chat is a conversation with its own message history. You can maintain multiple conversations simultaneously.

Creating a Conversation

Start a new chat from the sidebar. Give it a title or let the system auto-generate one from your first message.

Chat sidebar with conversation list and search

Auto-Generated Titles

After your first message, Chaos Cypher can automatically generate a concise 3-6 word title using a lightweight LLM call. This keeps your conversation list organized without manual naming.

Sending Messages

Type your question and the AI will:

  1. Search your indexed documents for relevant chunks
  2. Include the most relevant passages as context
  3. Generate a response grounded in that context
  4. Include citations linking back to source documents

Type your message in the chat input and press Enter or click Send. Responses stream in real-time.

Chat conversation with AI response and citations

AI Tools

The chat assistant has access to several tools for retrieving information:

ToolDescription
GraphRAG SearchGraph-enhanced retrieval that fuses knowledge graph traversal with vector search. Automatically prioritized when your database has extracted entities. Best for multi-hop questions spanning multiple documents.
Semantic SearchVector similarity search across document chunks. Used for direct content retrieval.
Graph SearchSearch for specific nodes and relationships in the knowledge graph.
SummarizeRetrieves document chunks, clusters them for representative selection, and generates a compressed summary using the LLM. Useful for condensing long documents or sets of sources.

The system automatically selects the best tool based on your question and the available data. When a knowledge graph with entities exists, GraphRAG is prioritized for richer, more connected answers.

Message Types

RoleDescription
UserYour questions and messages
AssistantAI-generated responses with citations
SystemAutomatic messages (e.g., scope changes)

Scoped Chat

By default, chat searches across all enabled sources in the current database. Scoped chat restricts the AI's context to specific sources.

Source Scoping

Open the chat dropdown on a source in the Sources list to start a source-scoped conversation. The AI will only search content from those specific documents.

Sources list with action buttons

Tag Scoping

Scope by tags to include all sources with matching tags:

Select tag(s) when creating or updating chat scope.

Combining Scopes

You can combine source IDs and tag IDs — the system merges them with deduplication.

Clearing Scope

Remove the scope to return to searching all sources. Scope changes are logged as system messages in the conversation.

tip

Scoped chat is useful when you have many sources but want to ask questions about a specific document or topic. It reduces noise and ensures the AI focuses on relevant content.

Citations

When the AI references information from your documents, responses include citations — links back to the specific source chunks used. Click a citation to see:

  • The source document name
  • The exact text passage
  • Page number and section (when available)

Chat showing AI response with inline source citations

Citations help you verify the AI's answers against your original documents.

Streaming

Chat responses use Server-Sent Events for real-time streaming. The stream sends several event types:

EventDescription
contentText chunks as they're generated
thinking_deltaAI reasoning steps (when thinking is enabled)
tool_callsTool invocations during response generation
tool_resultResults from tool calls
doneResponse complete
errorError during generation

If you close the browser during streaming, the response continues in the background and is saved to the conversation.

LLM Configuration

Chat behavior is controlled through LLM settings:

  • Provider — Ollama, OpenAI, Anthropic, or Gemini
  • Temperature — Controls response creativity (default: 0.3)
  • Max tokens — Maximum response length (default: 65536)
  • Thinking mode — Enable to see the AI's reasoning process

Configure these in Settings or settings.yaml. See Configuration for details.