Data Flow

How requests and data flow through the Chaos Cypher system.

API Request Lifecycle

Every request follows this path:

Interface sends HTTP request to the API
Cortex (API) validates input with Pydantic models
Factory function creates service with injected dependencies
Service orchestrates business logic, calls repository
Repository queries the database
Response flows back as dicts, serialized to JSON

Document Processing Pipeline

Stage Details

Upload (synchronous)

File received via multipart upload
Source record created in database with pending status
Indexing job queued to Operations queue
202 response returned immediately

Indexing (Operations queue)

Worker picks up job
Loader plugin parses file format → raw text
Content normalized (optional)
Text chunked into segments (~900 chars default)
Chunk embedding is queued as a separate embed_chunks task on the LLM queue and generated by the configured embedding provider (local sentence-transformers by default — not the chat/extraction LLM provider)
Chunks and embeddings stored
The source reaches indexed only after the embedding task completes

Awaiting confirmation (default)

The extraction domain is auto-detected during indexing and recorded as a detection_proposal
Unless the domain was forced at upload (or the gate is disabled via mcp.confirmation_required_default: false), the source parks with status awaiting_confirmation — extraction does not start yet
A human confirms or overrides the detected domain via the web wizard, CLI, API, or the MCP confirm_extraction tool
Extraction is then queued

See the Extraction Pipeline Overview for the full gate semantics.

Extraction (LLM queue)

Content filtering strips non-essential content (TOC, legal, boilerplate) from chunk copies — originals preserved for RAG
Filtered chunks grouped by token budget and sent to LLM for entity extraction
Template matching applied
Extraction limits enforced (per-entity degree cap, same-pair cap, total ratio cap)
Entities deduplicated (exact or semantic)
Relationships mapped between entities
Entity embeddings generated
Results stored as JSON
Status updated to extracted

Commit (Operations queue)

Entities created as graph nodes in the database
Relationships created as graph edges in the database
Templates created for new entity types
Document node linked to source
Status updated to committed

Chat / RAG Flow

Chat turns execute in the Neuron worker process — the API only enqueues the turn and relays events. If the worker is down, sent messages sit in the queue and no response is produced until it comes back.

RAG Details

User message received via POST /chats/{id}/send — the API saves it, sets the chat status to processing, enqueues a chat_background task on the LLM queue, and returns 202 Accepted with a task_id
The client subscribes to GET /chats/{id}/events (SSE) for live updates
The Neuron worker pops the task and runs the shared chat tool loop (chaoscypher_core/streaming/chat/loop.py):
- Conversation history loaded (with token budget management)
- RAG search executed against indexed chunks
- GraphRAG analysis via Personalized PageRank over the knowledge graph (when enabled) — seed entities are matched by vector similarity, then graph context is assembled from high-rank neighbors
- Reciprocal Rank Fusion (RRF) merges vector search and graph context results into a unified ranking
- Relevant chunks assembled as context
- LLM called with system prompt + context + chat history + tools
- LLM may call tools (search nodes, get relationships, graphrag_search) during the response
The worker publishes streaming events over Valkey pub/sub; the API relays them to the client as SSE
The worker persists the final message + citations and publishes done

POST /chats/{id}/retry and POST /chats/{id}/regenerate re-enqueue the same chat_background operation (without adding a new user message); POST /chats/{id}/cancel sets a cooperative Valkey cancel flag that the loop polls at step boundaries, persisting the partial answer before publishing done.

Queue Processing Flow

LLM queue — Priority-ordered (ZPOPMAX semantics, higher = first). Interactive requests (priority 100) run before background tasks (priority 50).
Operations queue — FIFO with 8 concurrent slots for I/O-bound work.
Both queues are polled by the same worker process with independent concurrency limits.
Cancellation — Both queued and running tasks can be cancelled. Running tasks use a cooperative Valkey flag that workers check between processing batches.

Storage Architecture

Each database has isolated storage:

Storage	Default	Contents
Database	SQLite	Sources, chunks, chats, workflows, tools, triggers, metrics, graph nodes, edges, templates
Search Index	FTS5 + sqlite-vec	Fulltext index + vector similarity index (in app.db)
Graph Analytics	rustworkx (compiled Rust)	On-demand in-memory loading of graph subsets for analytics (PageRank, community detection, centrality). See Knowledge Graph Storage.

The storage layer is pluggable via Core's hexagonal architecture. SQLite is the default backend — additional backends (PostgreSQL, etc.) can be added by implementing the storage protocols.

Analytics operations load relevant subsets of the graph into memory on-demand with configurable limits (default 1.5M nodes, 4M edges). Override in settings.yaml:

batching:
  graph_analysis_node_limit: 1500000   # increase for larger graphs
  graph_analysis_edge_limit: 4000000   # increase for larger graphs

Memory sizing guide: Use the table below to estimate RAM usage and set limits based on available memory. Each node uses ~670 bytes and each edge uses ~200 bytes when loaded for analytics.

Nodes	Edges	Approximate RAM
100,000	250,000	~120 MB
500,000	1,000,000	~540 MB
1,000,000	2,000,000	~1.1 GB
1,500,000	4,000,000	~1.8 GB
3,000,000	8,000,000	~3.6 GB

The message queue, settings.yaml, and credentials.json (single-user auth: bcrypt password hash + API keys, stored at the data root) are shared across all databases.

API Request Lifecycle​

Document Processing Pipeline​

Stage Details​

Chat / RAG Flow​

RAG Details​

Queue Processing Flow​

Storage Architecture​