Chaos Cypher Blog

Extract Smarter: How Domain-Aware AI Builds Better Knowledge Graphs

Thu, 12 Mar 2026 00:00:00 GMT

Most AI extraction tools treat every document the same way. Upload a medical paper or a legal contract and you get the same generic entity types, the same vague relationships, the same disappointing graph. Chaos Cypher takes a different approach: it detects what kind of document you uploaded and adapts its entire extraction pipeline to match.

The Problem with Generic Extraction

Here's a sentence you might find in a clinical document:

Patient with hypertension started on lisinopril 10mg daily. The ACE inhibitor is contraindicated with potassium supplements. Side effects include dry cough and dizziness.

A generic extraction pipeline -- the kind most tools use -- will pull out a handful of entities and connect them with whatever relationship labels the LLM feels like inventing. You might get "Lisinopril" typed as an Item, "Hypertension" as a Concept, and "Dry Cough" as another Concept. The relationships between them? Probably related_to and influences. Maybe associated_with if you are lucky.

This is the "garbage in, garbage out" of knowledge graphs. It's not that the AI failed to read the text. It read it fine. The problem is that nobody told it what to look for, what types are valid, or what the relationships between those types should mean.

The graph you get is technically correct and practically useless. You cannot query "which drugs treat hypertension" because the system does not know what a Drug is. You cannot find contraindications because related_to could mean anything. Every edge in the graph carries the same semantic weight as a shrug.

Now run the same sentence through Chaos Cypher with the medical domain active:

Lisinopril becomes a Drug with dosage form and mechanism of action as properties
Hypertension becomes a Condition
Dry Cough and Dizziness become Side Effects
Potassium Supplements gets recognized as a Drug (because supplements have drug interactions too)

The relationships are just as precise: treats, contraindicated_with, produces_side_effect. Each one is typed, directional, and constrained. Only a Drug or Treatment can treat a Condition. Only a Drug can produce_side_effect on a Side Effect. The LLM isn't guessing -- it's following a schema.

That's what domain-aware extraction does. It turns a language model from a general-purpose pattern matcher into a domain specialist.

How It Works: Upload to Knowledge Graph

The workflow is straightforward. You upload a document. Chaos Cypher figures out what domain it belongs to, loads the right extraction rules, and runs the pipeline. You don't need to configure anything upfront -- though you can override the detected domain if you want.

Here's what happens behind the scenes:

Detection -- Chaos Cypher samples the first few thousand characters of your document and scores it against all registered domains simultaneously. Each domain has weighted keyword groups, regex patterns, and file type signals. The highest-scoring domain wins.
Guidance injection -- The winning domain's extraction rules get injected into the LLM prompt. This includes entity type definitions, relationship constraints, exclusion rules (what not to extract), and worked examples of correct extractions.
Strict type enforcement -- The LLM is instructed to only use entity types from the domain's template list. After extraction, a code-level filter drops any entity whose type does not match a known template. No hallucinated types survive.
Relationship validation -- Each relationship is checked against source/target type constraints. A treats relationship must flow from a Drug, Treatment, or Procedure to a Condition or Symptom. Anything else gets rejected.
Quality scoring -- Extracted entities and relationships are scored by domain relevance. Domain-specific types like Drug and Condition score higher than generic fallbacks. This surfaces the most valuable parts of your graph.

Chaos Cypher ships with 16 built-in domains, each tuned for a different category of document:

Domain	Typical Entity Types	Best For
Biographical	Person, Life Event, Achievement, Relationship	Biographies, memoirs, personal histories
Cybersecurity	Threat Actor, Vulnerability, Malware, Attack Technique	Threat intel, incident reports, CVE research
Educational	Course, Learning Objective, Concept, Assessment	Textbooks, curricula, instructional materials
Financial	Company, Financial Instrument, Market Event, Regulation	Earnings reports, market analysis, SEC filings
Generic	Person, Organization, Event, Concept, Location	General-purpose fallback for any content
Historical	Historical Figure, Event, Treaty, Dynasty, Territory	Primary sources, historiography, timelines
Investigation	Suspect, Evidence, Witness, Case, Incident	Criminal/civil investigations, case files, forensics
Legal	Statute, Case, Party, Obligation, Legal Principle	Contracts, court opinions, regulatory filings
Literary	Character, Setting, Theme, Plot Element	Novels, poetry, drama, literary criticism
Medical	Drug, Condition, Symptom, Procedure, Side Effect	Clinical documents, pharmaceutical literature
News	Person, Organization, Event, Statement, Policy	News articles, press releases, journalism
Philosophical	Philosopher, Argument, Concept, School of Thought	Philosophy texts across global traditions
Political	Political Entity, Policy, Election, Legislation	Governance docs, political theory, policy analysis
Scientific	Hypothesis, Method, Finding, Dataset, Organism	Research papers, experiments, academic publications
Technical	Module, Class, Function, Endpoint, Design Pattern	API docs, codebases, technical specifications
Theological	Deity, Scripture, Doctrine, Ritual, Religious Figure	Sacred texts, theology, comparative religion

Every domain uses strict entity type enforcement by default. The medical domain defines 17 entity types. The technical domain has 14. These aren't suggestions -- they're the only types the LLM is allowed to produce. That constraint is what separates a clean, queryable graph from a noisy soup of ad-hoc labels.

Under the Hood: Domain Detection and Extraction Quality

How Detection Works

Domain detection runs a scoring algorithm across all registered domains simultaneously. Each domain defines its detection rules in a JSON-LD config file with three signal types:

Weighted keyword groups. The medical domain has six keyword groups: clinical_core (weight 1.2), pharmaceutical (weight 1.0), diagnostic (weight 0.9), anatomy (weight 0.8), procedures (weight 0.9), and clinical_terms (weight 0.8). Each keyword match boosts the confidence score by per_keyword_boost * weight. A document full of "diagnosis", "treatment", and "symptoms" racks up points fast in the clinical_core group, while scattered mentions of "cardiac" and "pulmonary" add smaller anatomy-weighted boosts.

Regex patterns. Keywords catch common terms, but patterns catch domain-specific notation. The medical domain matches dosage expressions like \d+\s*(mg|mcg|ml), ICD codes like ICD-10:J45, and prescription abbreviations like b.i.d. and p.r.n.. Each pattern match carries its own weight -- dosage notation at 1.4x, ICD codes at 1.5x. A single ICD code in a document is a strong medical signal.

File and document type signals. File extensions (.py for technical) and document type metadata (medical_document, openapi) provide additional boosts.

The final confidence score is compared against a per-domain minimum threshold. Medical requires 0.4 minimum confidence. The generic domain has a threshold of 0.0 -- it always matches as a fallback, but with the lowest possible score (0.1), so any specialized domain that passes its threshold will win.

How Domains Shape Extraction Quality

Detection picks the right domain. But the real value is in what happens next -- how the selected domain controls the extraction pipeline.

Entity guidance tells the LLM what to extract and what to skip. The medical domain instructs: "Extract conditions, symptoms, treatments, drugs, procedures, and anatomical locations. Include dosage information as properties on drug entities." It also lists explicit exclusion rules: don't extract dosage numbers alone ("500mg" is a property of a Drug, not a standalone entity), don't extract study references ("Figure 1", "Table 2"), don't extract administrative codes as entities.

Strict type enforcement prevents hallucinated types. When strict mode is on -- and it is on for all 14 specialized domains -- the LLM receives a closed list of valid entity types. The medical domain allows exactly 17 types: Condition, Symptom, Treatment, Drug, Procedure, Diagnostic Test, Anatomy, Pathogen, Clinical Trial, Dosage, Side Effect, Risk Factor, Gene/Biomarker, Patient Population, Guideline/Protocol, Outcome/Endpoint, and Mechanism of Action. Anything the LLM produces outside that list gets dropped in post-processing. No more "Medical Concept" or "Health Thing" cluttering your graph.

Relationship constraints validate source and target combinations. The medical domain's treats relationship is constrained: source must be Drug, Treatment, or Procedure; target must be Condition or Symptom. If the LLM tries to say a Symptom treats a Drug, the relationship fails validation. This catches the most common extraction error -- reversed or semantically nonsensical edges.

Compatibility groups enable smart deduplication. When the same entity appears in different chunks with slightly different types -- "Hypertension" as a Condition in one chunk and as a "Medical Concept" in another -- the compatibility groups determine whether they can be merged. In the medical domain, Condition and Symptom share the clinical group, so they are merge-eligible. Drug, Treatment, and Procedure share the treatment group. This prevents duplicate entities without losing type precision.

Property type mapping rescues mistyped entities. Sometimes the LLM extracts "Severity" as a standalone entity when it should be a property on a Condition. The medical domain's property mapping knows that "Severity" should be absorbed into Condition as a severity property, and "Mechanism" into Drug as a mechanism property. Instead of cluttering the graph with orphaned attribute nodes, they get folded into the right place.

Try It Yourself

Every built-in domain is just a JSON-LD file. No Python, no compilation, no framework code. If you need a domain for your field that doesn't exist yet, you can create one in about 20 minutes.

Let's build a startup domain for analyzing pitch decks, funding announcements, and tech industry news.

Create a file called startup.jsonld in your data/plugins/domains/ directory:

{
  "@context": {
    "@vocab": "https://chaoscypher.io/schema/domain#",
    "schema": "https://schema.org/",
    "name": "schema:name",
    "description": "schema:description"
  },
  "@type": "ExtractionDomain",
  "@id": "domain:startup",

  "name": "startup",
  "version": "1.0.0",
  "description": "Startup ecosystem: funding, founders, products, and acquisitions",
  "strict_entity_types": true,

  "detection": {
    "keywords": {
      "funding": {
        "terms": ["series A", "series B", "seed round", "venture capital",
                  "valuation", "fundraise", "runway", "cap table"],
        "weight": 1.3
      },
      "ecosystem": {
        "terms": ["startup", "founder", "co-founder", "incubator",
                  "accelerator", "pivot", "product-market fit", "MVP"],
        "weight": 1.1
      }
    },
    "patterns": [
      {"regex": "\\$\\d+[MBK]\\s+(seed|series|round|valuation)", "weight": 1.5},
      {"regex": "(?i)Y Combinator|Techstars|500 Startups", "weight": 1.3}
    ],
    "confidence": {
      "base_score": 0.2,
      "per_keyword_boost": 0.05,
      "pattern_boost": 0.15,
      "min_threshold": 0.4
    }
  },

  "entity_guidance": "Extract companies, founders, investors, funding rounds, and products. Attach dollar amounts and dates as properties on Funding Round entities, not as standalone entities.",

  "templates": {
    "node_templates": [
      {
        "id": "startup_company", "name": "Company",
        "description": "A startup, corporation, or business entity",
        "requires_named_referent": true,
        "quality_score": 25,
        "properties": [
          {"name": "stage", "display_name": "Stage", "property_type": "text"},
          {"name": "industry", "display_name": "Industry", "property_type": "text"}
        ]
      },
      {
        "id": "startup_person", "name": "Founder",
        "description": "A founder, co-founder, or key executive",
        "requires_named_referent": true,
        "quality_score": 25
      },
      {
        "id": "startup_investor", "name": "Investor",
        "description": "A VC firm, angel investor, or investment entity",
        "requires_named_referent": true,
        "quality_score": 25
      },
      {
        "id": "startup_round", "name": "Funding Round",
        "description": "A specific funding event (seed, Series A, etc.)",
        "requires_named_referent": false,
        "quality_score": 25,
        "properties": [
          {"name": "amount", "display_name": "Amount", "property_type": "text"},
          {"name": "date", "display_name": "Date", "property_type": "date"}
        ]
      },
      {
        "id": "startup_product", "name": "Product",
        "description": "A software product, platform, or service",
        "requires_named_referent": true,
        "quality_score": 18
      }
    ],
    "edge_templates": [
      {
        "id": "startup_founded_by", "name": "founded_by",
        "description": "Company was founded by a person",
        "inverse": "founded",
        "source_types": ["Company"], "target_types": ["Founder"]
      },
      {
        "id": "startup_invested_in", "name": "invested_in",
        "description": "Investor participated in a funding round",
        "inverse": "funded_by",
        "source_types": ["Investor"], "target_types": ["Funding Round"]
      },
      {
        "id": "startup_raised", "name": "raised",
        "description": "Company raised a funding round",
        "inverse": "round_for",
        "source_types": ["Company"], "target_types": ["Funding Round"]
      },
      {
        "id": "startup_acquired_by", "name": "acquired_by",
        "description": "Company was acquired by another company",
        "inverse": "acquired",
        "source_types": ["Company"], "target_types": ["Company"]
      },
      {
        "id": "startup_builds", "name": "builds",
        "description": "Company builds or maintains a product",
        "inverse": "built_by",
        "source_types": ["Company"], "target_types": ["Product"]
      }
    ]
  }
}

A few things to notice about this file:

The detection section defines how Chaos Cypher recognizes startup content. The keyword groups are weighted -- "series A" and "venture capital" in the funding group carry more weight (1.3x) than general ecosystem terms (1.1x). The regex patterns catch dollar-amount-plus-round expressions like "$50M Series B" at 1.5x weight. These signals stack: a pitch deck mentioning several funding terms and a dollar figure will score well above the 0.4 threshold.

The templates section defines the vocabulary. Five entity types, five relationship types. Each entity template has an id, a name (the type label that appears in the graph), and a description that helps the LLM understand what qualifies. The requires_named_referent flag tells the system whether an entity needs a proper name -- a Company does, but a Funding Round does not (it can be "Series A round" or just "the seed round"). Properties like amount and stage get attached to entities rather than floating as separate nodes.

The edge_templates constrain which entity types can appear on each side of a relationship. founded_by only flows from Company to Founder. invested_in only flows from Investor to Funding Round. The inverse field defines the reverse label for bidirectional traversal.

Restart Chaos Cypher and your domain is live. Upload a TechCrunch article or a pitch deck and watch the detection engine pick it up. Your custom entity types appear in the graph, constrained by the relationships you defined.

If you need to go deeper, the built-in domains show what else is possible: normalization keywords that fix LLM type inconsistencies, compatibility groups for smart deduplication, property type mappings that absorb mistyped entities, alias examples that teach the LLM about synonym handling, and extraction limits that tune relationship density. The medical domain is the most comprehensive example -- it defines 17 entity types, 20 relationship types, dosage regex patterns, ICD code detection, and evidence validation in strict mode. Study it when you want the full picture.

What's Next

We're planning more specialized domains -- supply chain, environmental science, and music theory are on the shortlist. But the real potential is in what users build. Every field has its own vocabulary, its own entity types, its own relationship patterns. A materials scientist cares about Crystal Structure, Synthesis Method, and Property. A genealogist needs Person, Family, Vital Record, and Census Entry. A cybersecurity analyst -- who already has a built-in domain -- might want to fork it and add types specific to their organization's threat model.

If you build a domain for your field, share it. A JSON-LD file is small, portable, and easy to review. Drop it in data/plugins/domains/ and it works. No pull request required to use it, but we would love to include community domains in the built-in set for others to benefit from.

Domains work identically whether you're running locally with Ollama or with a cloud provider. The domain system documentation covers the full JSON-LD schema, all available configuration options, and advanced features like extraction density tuning and evidence validation modes. Start with the five-entity example above, test it on your documents, and iterate from there.

Why Your RAG Chat is Missing Half the Answers (And How GraphRAG Fixes It)

Thu, 12 Mar 2026 00:00:00 GMT

You upload four research papers to your RAG chatbot. You ask: "How does Dr. Chen's CRISPR research connect to the gene therapy trials at Stanford?" The chatbot thinks for a moment and gives you... a paragraph about CRISPR. Generic, shallow, pulled from whichever single chunk happened to mention the word. The actual answer -- that Chen published a paper on CRISPR delivery mechanisms, which was cited by a Stanford clinical trial for retinal gene therapy, which built on a funding collaboration between both institutions -- exists across three different documents. Your chatbot never even tried to find it.

This is the multi-hop problem, and it's the silent failure mode of every vector-only RAG system. Vector search embeds your question, compares it against document chunks, and returns the closest matches by cosine similarity. It works for single-hop questions: "What is CRISPR?" or "When did the Stanford trial begin?" But the moment an answer requires connecting information across documents -- following a citation chain, tracing a person through multiple sources, linking a cause in one report to an effect in another -- vector search falls apart. It can't follow relationships. It doesn't know that entities in different documents refer to the same thing. It just sees text.

The worst part: it fails silently. No error message, no "I couldn't find a complete answer." You get a confident-sounding response that happens to be shallow or wrong.

Chaos Cypher's GraphRAG search fixes this by fusing knowledge graph traversal with vector search. When you ask a multi-hop question, it walks the graph of entities and relationships extracted from your documents, finds structurally connected information you didn't ask about, retrieves the source passages that prove those connections, and merges everything into a single ranked result set. The answer you get isn't just semantically similar text. It's the actual chain of evidence.

What Happens When You Ask a Multi-Hop Question

Let's walk through a real scenario. You have uploaded three documents into Chaos Cypher: a research paper by Dr. Sarah Chen on CRISPR delivery vectors, a Stanford clinical trial report on retinal gene therapy, and a grant proposal connecting both institutions. You type into the chat: "How does Chen's CRISPR work relate to the Stanford gene therapy trial?"

Here's what happens behind the scenes, in seven steps.

Step 1: Embed the query. Your question gets converted into a vector embedding -- the same starting point as any RAG system.

Step 2: Match seed entities. Instead of immediately searching document chunks, GraphRAG first searches the knowledge graph. It finds entities whose embeddings are closest to your query vector. In this case, it matches "Dr. Sarah Chen" (a Person node) and "CRISPR delivery vectors" (a Concept node) as high-confidence seeds -- the anchor points for graph exploration.

Step 3: Personalized PageRank. This is where it gets interesting. Standard PageRank finds globally important nodes. Personalized PageRank is different: it starts from your seed entities and performs a biased random walk through the graph. At each step, there is an 85% chance of following a relationship to a neighbor, and a 15% chance of teleporting back to a seed. Entities structurally close to your seeds get high scores, even if they were never mentioned in your query.

In our example, the algorithm discovers that "Dr. Sarah Chen" has a "published" relationship to "Lipid Nanoparticle Delivery Study," which has a "cited_by" edge pointing to "Stanford Retinal Gene Therapy Trial Phase II," which in turn has a "funded_by" connection to "NIH CRISPR Therapeutics Grant" -- a grant that also lists Chen as a co-investigator. None of these intermediate entities matched your query by text similarity. The graph surfaced them.

Step 4: Assemble graph context. The top-scoring entities from PageRank are collected along with their relationships. This produces a structured context: seed entities you asked about, related entities the graph discovered, and the relationship triples connecting them. This context gets passed to the language model alongside the document chunks, giving it the structural "map" it needs to reason about connections.

Step 5: Retrieve provenance chunks. The first of two parallel retrieval paths. For each entity the graph surfaced, GraphRAG looks up which document chunks those entities were originally extracted from. Chen was extracted from page 3 of the research paper. The Stanford trial came from the clinical report abstract. The funding connection came from page 12 of the grant proposal. These "provenance chunks" contain the actual evidence for the graph relationships.

Step 6: Retrieve vector chunks. The second path runs simultaneously -- standard hybrid search (semantic + keyword) against all document chunks. It catches relevant passages that might not have generated graph entities but still contain useful context.

Step 7: Merge and rank. The two paths produce two independently ranked lists. GraphRAG merges them using Reciprocal Rank Fusion, which combines rankings without normalizing scores across systems. Chunks appearing in both lists get a combined boost. The result is a single, deduplicated, ranked list of the most relevant passages across all your documents.

Instead of a shallow answer about CRISPR, you get the full chain: Chen's delivery mechanism research led to a cited clinical application at Stanford, connected through shared funding. The chat response includes both the graph context (discovered entities and relationships) and the document passages that prove those connections.

Under the Hood (Technical Deep-Dive)

This section is for developers who want to understand the algorithms. Skip ahead to "Try It Yourself" if you just want to use it.

Personalized PageRank

Standard PageRank models a "random surfer" following links uniformly across a network. Personalized PageRank changes one thing: instead of teleporting to a random node, the surfer teleports back to seed nodes. This transforms a global importance metric into a query-specific relevance metric.

Chaos Cypher's implementation uses power iteration. Starting from scores concentrated on seed entities, it iteratively updates every node based on contributions from inbound neighbors, weighted by out-degree. The damping factor (0.85 default) controls the balance: higher values explore further from seeds; lower values keep scores tightly clustered.

Convergence is detected when the maximum score change drops below 1e-6, or after 100 iterations. In practice, most graphs converge in 15-30 iterations. The computation runs in-process with no external dependencies -- a graph of 10,000 nodes and 40,000 edges typically completes in under 100ms.

The seed weights come from vector similarity scores in Step 2. If "Dr. Sarah Chen" matched at 0.82 and "CRISPR delivery vectors" at 0.71, those scores become the personalization weights. The random walk isn't just seeded on the right entities -- it's biased toward the ones most relevant to your specific question.

Reciprocal Rank Fusion

Provenance chunks have graph-connectivity scores. Vector chunks have cosine similarity scores. These aren't on the same scale, so you can't just sort by score.

RRF (Cormack, Clarke & Butt, 2009) sidesteps this by ignoring scores entirely and using only rank positions. Each chunk's RRF score is the sum of 1 / (k + rank) across all lists where it appears. The smoothing constant k (60, matching the original paper) dampens the advantage of being ranked first versus second.

The key property: chunks appearing in both lists get contributions from both, naturally boosting results validated by two independent signals. A chunk ranked 5th in provenance and 8th in vector search will often outrank one that is 1st in vector but absent from provenance. Evidence confirmed by graph structure is worth more than text similarity alone.

Graceful Degradation

Not every database has a knowledge graph. Not every query matches graph entities. GraphRAG picks its operating mode automatically:

full_graphrag -- Seeds found, PPR succeeded. Graph context + provenance chunks + vector chunks + RRF fusion.
vector_only -- Embeddings work but no graph seeds found. Standard hybrid search, no graph context.
keyword_only -- Embeddings unavailable. Pure SQLite FTS keyword search.

The system never fails -- it always returns the best results it can. The retrieval stats in each response tell you exactly what happened: mode used, seeds found, entities explored, provenance versus vector chunk counts.

Tunable Parameters

Six parameters in settings.yaml control the GraphRAG pipeline. The defaults work well for most databases, but here they are if you want to tune:

Parameter	Default	What It Controls
`seed_similarity_threshold`	0.3	Minimum cosine similarity for a graph entity to qualify as a PPR seed. Lower values cast a wider net but may introduce noise.
`ppr_top_k`	20	Number of top-scoring entities from PageRank to include in graph context. Higher values give the LLM more structural context at the cost of token budget.
`ppr_damping`	0.85	PageRank damping factor. Higher means more exploration away from seeds. Lower keeps results closer to directly matched entities.
`max_triples`	200	Maximum relationship triples included in the graph context summary. Capped to avoid flooding the LLM context window.
`vector_overfetch_multiplier`	3	When searching for seed entities, fetch 3x the seed limit from the vector index to account for non-entity results (chunks) that need filtering.
`max_graph_nodes`	50,000	Safety limit. If your graph exceeds this, PPR is skipped (too expensive) and the system falls back to vector-only mode.

Try It Yourself

Here's the good news: you don't need to configure anything. GraphRAG is the default search mode behind every chat conversation in Chaos Cypher. When you type a question, the chat system automatically calls graphrag_search as its first tool. If your database has extracted entities and embeddings, you get the full pipeline. If not, it degrades gracefully to vector or keyword search.

The simplest way to see it in action:

Upload 3-4 related documents. Pick sources that share entities -- research papers from the same field, chapters from the same book, reports about the same project. The key is overlap: the documents should reference some of the same people, organizations, concepts, or events.
Wait for extraction to complete. Chaos Cypher will chunk the documents, generate embeddings (automatic), and then you can optionally run entity extraction to build the knowledge graph. The extraction step is what creates the graph nodes and edges that GraphRAG traverses. Without it, you still get vector-only search, which is fine -- but you miss the multi-hop connections.
Ask a question that spans documents. Don't ask something that a single document can answer. Ask about connections: "How does X relate to Y?" or "What is the link between the findings in paper A and the methodology in paper B?" This is where GraphRAG earns its keep.
Check the retrieval stats. In the chat response metadata, you'll see the retrieval mode (full_graphrag, vector_only, or keyword_only), the number of seed entities found, how many entities PageRank explored, and the breakdown of provenance versus vector chunks. This tells you exactly what the pipeline did for your query.

GraphRAG is also available as an MCP tool called graphrag_search, meaning any AI assistant that supports MCP can use it directly against your Chaos Cypher instance. See our MCP launch post for setup instructions with Claude Desktop, Cursor, and others. The tool accepts a query, an optional chunk limit, and optional source ID filters for scoping searches to specific documents.

If you want to fine-tune the pipeline for your specific use case, add a graphrag section to your settings.yaml:

graphrag:
  seed_similarity_threshold: 0.3
  ppr_top_k: 20
  ppr_damping: 0.85
  max_triples: 200
  vector_overfetch_multiplier: 3
  max_graph_nodes: 50000

Most users will never need to touch these. The defaults were chosen based on the GraphRAG literature and testing across databases of varying sizes -- from small personal collections (hundreds of entities) to larger research corpora (tens of thousands of entities).

What's Next

GraphRAG in Chaos Cypher today handles local queries well -- questions where you have a specific starting point and want to follow connections outward. But there's a class of questions it doesn't yet handle optimally: corpus-wide questions like "What are the main themes across all my documents?" or "Summarize everything related to sustainability."

These require what the research literature calls community summaries -- pre-computed summaries of entity clusters in the graph that can answer high-level questions without traversing the entire structure at query time. That's on the roadmap.

If you're working with a use case where multi-hop retrieval matters -- legal discovery, academic research, intelligence analysis, medical literature review -- we'd love to hear about your experience. What kinds of multi-hop questions does your work require? Where does the current pipeline fall short? The best way to reach us is through the project's GitHub discussions.

For a deeper look at the architecture, see the Search documentation and the Architecture overview.

Build a Private AI Knowledge Graph That Never Leaves Your Machine

Thu, 12 Mar 2026 00:00:00 GMT

Every week, another AI tool asks you to upload your most sensitive documents to someone else's servers. Your contracts, medical records, internal research, personal journals -- all piped through APIs you don't control, stored in logs you can't audit, governed by terms of service that change without notice.

For a lot of use cases, that's fine. But there's a whole class of knowledge that simply cannot leave your network. Healthcare organizations bound by HIPAA. Law firms handling privileged communications. Financial institutions with regulatory obligations around client data. Companies whose competitive advantage lives in proprietary research. Or maybe you just have a journal and you'd rather not feed your inner monologue to a data center in Virginia.

The usual answer is "just don't use AI tools." That's not really an answer anymore. The productivity gap between AI-assisted knowledge work and manual knowledge work is too wide to ignore. The real question is: can you get the benefits of AI-powered knowledge graphs without the privacy tradeoffs?

Yes. Chaos Cypher paired with Ollama runs a complete AI knowledge graph pipeline -- document ingestion, entity extraction, relationship mapping, semantic search, and conversational chat -- entirely on your local machine. No API keys. No usage limits. No monthly bills. No data leaving your network. You install it, you run it, you own it.

This isn't a compromise or a toy demo. It's the same extraction pipeline, the same graph visualization, the same chat interface that works with cloud providers. You're just swapping the LLM backend from a remote API to a local one.

From Zero to Local Knowledge Graph

Here's the full workflow, start to finish. Fifteen minutes if you're following along, five if you've done this before.

Step 1: Install Ollama and pull a model.

Head to ollama.com and install it for your platform. Then pull a model:

ollama pull qwen3:30b

That downloads the model weights once. After that, Ollama runs as a local API server -- same REST interface as OpenAI, but pointing at localhost:11434.

Step 2: Start the Chaos Cypher stack.

make docker-dev

This brings up four containers: the Cortex API server, a Neuron background worker, the web Interface, and Valkey for job queuing. Everything talks to Ollama on your host machine through Docker's host.docker.internal bridge. No external network calls.

Step 3: Upload a document.

Open http://localhost:3000, create a database (or use the default), and drag a PDF, DOCX, or text file into the Sources page. Chaos Cypher immediately begins indexing -- chunking the document, generating embeddings, and building a search index. This takes about 30 seconds for a 100-page PDF and requires no GPU at all (more on that below).

Step 4: Extract entities and relationships.

Once indexing completes, kick off entity extraction. This is where the LLM does its work -- reading through each chunk, identifying entities (people, organizations, concepts, events), discovering relationships between them, and building a structured knowledge graph. Chaos Cypher automatically detects the type of document and applies domain-specific extraction rules for higher quality results. For a 100-page document with a 30B model, expect roughly 5-10 minutes.

Step 5: Chat with your knowledge graph.

Once extraction finishes and the results are committed to your graph, open the Chat page and start asking questions. The chat system uses RAG (retrieval-augmented generation) to search your indexed documents and graph, then feeds the relevant context to your local LLM for a grounded answer. Everything stays on your machine -- the search, the retrieval, the generation.

Pick Your Preset

Not everyone has the same GPU. Chaos Cypher ships with VRAM presets that auto-configure the right model, context window, and batch size for your hardware. Select a preset in Settings and it handles the rest.

VRAM	Chat Model	Extraction Model	Context	GPU Examples
16 GB	Phi4 14B	Phi4 14B	16K	RTX 4080, RTX 5080
20 GB	Phi4 14B	Phi4 14B	24K	RTX 5080 Super
24 GB	Qwen3 30B	Qwen3 30B Instruct	16K	RTX 4090, RTX 3090
32 GB	Qwen3 30B	Qwen3 30B Instruct	32K	RTX 4090, RTX 3090
48 GB	Qwen3 30B	Qwen3 30B Instruct	48K	A6000, 2x 4090
96 GB	Qwen 2.5 72B	Qwen 2.5 72B Instruct	48K	H100
128 GB	Qwen 2.5 72B	Qwen 2.5 72B Instruct	64K	Multi-H100

The sweet spot for most people is 24 GB. An RTX 4090 running Qwen3 30B gives you strong chat quality and solid extraction results. If you're on 16 GB, you'll still get a good experience for chat and search -- extraction quality will be noticeably lower on complex documents, but perfectly usable for straightforward material.

Under the Hood

A few things are worth knowing about how the local pipeline actually works.

Embeddings Are Always Local

Here's something that surprises people: the embedding model that powers semantic search runs on CPU. It has nothing to do with Ollama or your GPU. Chaos Cypher defaults to Qwen3-Embedding-0.6B, a compact model that downloads once and runs locally via sentence-transformers. Any HuggingFace sentence-transformers model can be used, and cloud providers (OpenAI, Ollama, Gemini) are also supported.

This means semantic search works even if Ollama is offline. It means you can index thousands of documents on a machine with no GPU at all. The embeddings are generated in the Neuron worker during indexing and stored in your local SQLite database (via sqlite-vec). Search queries generate an embedding on the fly, compare it against the index, and return results -- all on CPU, all local, typically in under a second.

Re-ranking also runs locally. Chaos Cypher uses a cross-encoder model (Alibaba-NLP/gte-reranker-modernbert-base, 149M parameters, ~600 MB) via sentence-transformers to re-rank search results by relevance before passing them to the LLM. No API calls involved. The ModernBERT-based model scores ~56.2 NDCG@10 on the BEIR benchmark -- significantly more accurate than smaller models on diverse, out-of-domain queries. Any HuggingFace cross-encoder can be swapped in via settings.

Multi-Instance Load Balancing

Have multiple machines with GPUs? Or multiple GPUs in one workstation? You can point Chaos Cypher at all of them. Configure multiple Ollama instances in your settings, and the load balancer distributes requests across them with three strategies:

Round-robin -- simple alternation, good for identical hardware
Least-loaded -- sends requests to whichever instance has the fewest active jobs
Random -- exactly what it sounds like

Each instance gets independent health checks. If one goes down, the load balancer automatically fails over to the healthy instances. When it comes back, it rejoins the pool. The configuration is hot-reloadable -- add or remove instances from the Settings page without restarting anything. In-flight requests drain gracefully before an instance is removed.

This is particularly useful for extraction workloads. A 500-page document produces hundreds of chunk groups to process. Spreading that across two or three GPUs cuts extraction time proportionally.

Thinking Mode

Qwen3 models support an extended reasoning mode using tags -- the model works through its reasoning step by step before producing a final answer. Chaos Cypher detects and handles this automatically. When thinking is enabled for chat, the model's internal reasoning is extracted and available separately from the final response. For models that don't support thinking tags, everything works normally -- no configuration needed, graceful fallback.

Thinking is currently best suited for chat interactions where you want more careful, reasoned responses. For extraction tasks, the overhead of reasoning tokens tends to slow things down without a proportional quality improvement, so Chaos Cypher disables it for extraction by default. You can toggle this per-operation type in settings.

Performance Reality Check

Let's be honest about the tradeoffs, because nobody benefits from hype.

Chat is great locally. Interactive question-answering with RAG retrieval works well on 24 GB+ hardware. The model has context from your documents, it generates coherent answers, latency is acceptable for interactive use. Streaming means you see tokens as they arrive -- the experience feels responsive even when total generation takes a few seconds.

Simple extraction works well. Documents with clear entity boundaries -- people's names, organization names, dates, locations -- extract reliably on local models. Legal contracts with named parties and defined obligations, research papers with cited authors and institutions, meeting notes with action items and owners.

Complex extraction is where you notice the gap. Dense academic papers with nuanced conceptual relationships, documents where entities are implied rather than stated, multi-hop reasoning about how concepts relate to each other -- this is where cloud models with 100B+ parameters still have a meaningful advantage. A Qwen3 30B model will get you 70-80% of what Claude or GPT-4.1 would produce on hard extraction tasks. For many use cases, that's more than enough. For others, you'll want to use a cloud provider for the extraction pass and keep everything else local.

The good news: Chaos Cypher lets you mix and match. Use Ollama for chat and search (where privacy matters most, since those are interactive queries about your data), and use a cloud provider for the one-time extraction pass if you need maximum quality. Or keep everything local and accept the quality tradeoff. Your call.

Four Providers, One Interface

Chaos Cypher supports four LLM providers through a unified interface:

Ollama -- local models, no API key, no cost
OpenAI -- GPT-4.1, high-quality extraction
Anthropic -- Claude Sonnet 4.5, strong reasoning
Gemini -- Gemini 2.5 Pro, massive context window

Switching between them is a single config change. The same entity extraction pipeline, the same chat system, the same search infrastructure. You can start with Ollama to prove the workflow works, then switch to a cloud provider for production extraction, or vice versa. You can even use different providers for different operations -- Ollama for chat, OpenAI for extraction.

Try It Yourself

Minimal configuration in data/settings.yaml:

LLM:
  chat_provider: "ollama"
  ollama_chat_model: "qwen3:30b-instruct"
  ollama_num_ctx: 32768

The default Ollama instance points at http://host.docker.internal:11434, which Just Works™ for the all-in-one container talking to a host-side Ollama. To override the URL or add multi-GPU instances, use ollama_instances.

Or skip the YAML entirely -- open the Settings page in the UI, select Ollama as your provider, pick a VRAM preset that matches your GPU, and you're done. The preset fills in the model name, context window, batch size, and extraction model automatically.

Then start everything:

make docker-dev

Upload a document, wait for indexing (30 seconds) and extraction (a few minutes), and you have a working knowledge graph built entirely on your hardware.

A few tips for getting the best results:

Pull models before starting Chaos Cypher. Run ollama pull qwen3:30b (or whichever model your preset uses) before your first extraction. The Neuron worker will wait for Ollama, but pre-pulling avoids the initial download delay.
Monitor VRAM usage. Run nvidia-smi to see how much VRAM your model is using. If you're near the limit, drop to a smaller context window or a smaller model. OOM kills during extraction are recoverable (the job retries), but they're slow.
Start with shorter documents. Your first upload should be a 10-20 page document so you can see the full pipeline complete in a couple of minutes. Scale up once you're comfortable with the output quality.
Experiment with extraction models. The presets pair specific extraction models with chat models. The extraction model uses an instruct-tuned variant optimized for structured output. If extraction quality isn't where you want it, try the next VRAM tier up -- the jump from 8B to 30B parameters makes a significant difference in extraction accuracy.

What's Next

Running everything locally is the starting point, not the ceiling.

If you outgrow a single GPU, the multi-instance setup lets you spread load across multiple machines on your network -- a small GPU cluster for your team, still fully private, still no cloud dependency. Configure two or three Ollama instances on different machines, point Chaos Cypher at all of them, and extraction workloads parallelize automatically.

When you do need cloud-tier quality for specific tasks, the cloud providers are there. Chaos Cypher doesn't lock you into local-only or cloud-only. You choose per-operation, per-database, whenever you want. The architecture is the same either way -- the only thing that changes is where the LLM inference happens.

The privacy argument isn't really about paranoia. It's about control. Your knowledge graph is a map of everything you know -- your research, your relationships, your institutional memory. Keeping that map on your own hardware isn't a limitation. It's a feature.

Give Any AI Assistant Direct Access to Your Knowledge Graph with MCP

Thu, 12 Mar 2026 00:00:00 GMT

Your knowledge graph is stuck in a browser tab. You built something valuable -- a map of entities, relationships, and source documents that represents real understanding of a domain. But the moment you switch to Claude to write a report, or open Cursor to write code, or ask ChatGPT to help with analysis, that knowledge graph might as well not exist. You're back to copying text, pasting context, and manually cross-referencing. Two tools that should be working together are stuck in separate worlds.

Chaos Cypher now speaks MCP, which means any AI assistant that supports the protocol -- Claude Desktop, Claude Code, Cursor, Windsurf, and a growing list of others -- can directly query, search, traverse, and even write to your knowledge graph. No copy-paste. No context switching. Just ask.

This post walks through what that actually looks like, what's under the hood, and how to set it up in about two minutes.

What Is MCP, and Why Should You Care?

MCP stands for Model Context Protocol. Anthropic released it as an open standard, and the simplest analogy is USB-C for AI tools. Before USB-C, every device had its own charger, its own cable, its own connector. MCP does the same thing for AI integrations: it defines one protocol that any AI host can use to talk to any tool server.

Instead of building a custom plugin for Claude, another for ChatGPT, another for Cursor, and another for every new AI tool that launches next month, you build one MCP server. Every compatible AI tool can use it immediately.

The adoption has been fast. Claude Desktop, Claude Code, Cursor, Windsurf, Cline, and Continue all support MCP today. The protocol handles tool discovery (the AI asks "what can you do?"), tool invocation (the AI calls a function with parameters), and result streaming. From the AI's perspective, your knowledge graph becomes just another set of capabilities it can use to answer questions.

From your perspective, it means you stop being the middleman between your data and your AI.

What This Actually Looks Like

The best way to understand MCP is to see the before and after.

Before MCP: You have a knowledge graph with 200 entities extracted from research papers on gene therapy. You're writing a literature review in Claude. To reference your graph, you open Chaos Cypher in another tab, run a search, copy the results, paste them into Claude, ask your question, realize you need more context, go back to the graph, find related entities, copy those too, paste again. Repeat until frustrated.

After MCP: You tell Claude: "Search my knowledge graph for all entities related to CRISPR and find the shortest path to gene therapy applications." Claude calls graphrag_search to find relevant entities and document passages, then calls find_shortest_path to trace the relationship chain. You get a grounded answer with specific entities and relationships from your own research, in one turn.

Here are three scenarios that show the range of what's possible.

Scenario 1: Research -- Connecting the Dots

You've been building a knowledge graph from papers on quantum computing and machine learning. You're deep in a writing session in Claude Desktop and want to understand where these two fields intersect in your collected research.

You ask: "What are the connections between quantum computing and machine learning in my research? Show me the key entities and how they're related."

Claude calls search_nodes to find nodes matching both topics, then get_node_context to pull the immediate neighborhood of the most central ones, including the edges that connect them and the source document chunks that support each relationship. You get back a structured map of how your research connects these fields -- not a generic internet answer, but one grounded in the specific papers you've indexed.

Scenario 2: Coding -- Your Project's Knowledge Base in Your Editor

You're in Cursor, working on a codebase that has an associated knowledge graph mapping its architecture -- services, APIs, data flows, dependencies. You need to understand how the authentication service connects to the billing pipeline.

You ask: "Traverse from the Authentication Service node to anything related to billing. What's the path?"

Cursor calls resolve_node to find the canonical node for "Authentication Service" (even if you didn't remember the exact label), then traverse_path to walk the graph two hops out, filtered to the relevant edge types. You see the chain: Authentication Service -> User Session -> Subscription Manager -> Billing Pipeline. Without leaving your editor.

Scenario 3: Writing -- Summarize With Citations

You're drafting a report and need to summarize everything in your knowledge graph about a specific topic, with citations back to the original source documents.

You ask: "Summarize all my sources related to climate policy in the European Union. Include which documents each claim comes from."

Claude calls get_summary_context to retrieve and cluster document chunks relevant to the query. Because this tool returns the raw chunks with their source metadata rather than making an LLM call, Claude itself does the summarization -- giving you a synthesis grounded in your documents, with each claim traced back to a specific source.

Under the Hood: 30 Tools, 7 Categories

Chaos Cypher exposes 30 tools through MCP, organized into seven categories. The design principle is that read operations are always safe and always available. Write operations are opt-in.

Category	Read	Write	What It Does
GraphRAG	`graphrag_search`	--	The flagship tool. Fuses Personalized PageRank over the knowledge graph with hybrid vector/keyword search. Finds answers that pure vector search misses because it follows relationships.
Nodes	`search_nodes`, `search_chunks`, `get_node`, `get_node_context`, `resolve_node`	`create_node`, `update_node`, `delete_node`	Full CRUD for graph nodes. Search by name, properties, or semantic similarity. Resolve aliases to canonical nodes. Get a node's full neighborhood with edges and supporting document chunks.
Edges	`list_edges`, `get_node_edges`	`create_edge`	Explore and create relationships. Filter by direction (incoming/outgoing), edge type, or connected node.
Templates	`list_templates`, `search_templates`	`create_template`, `delete_template`	Templates define the schema for nodes and edges. Search by name or description. Create new types on the fly.
Analytics	`analyze_graph_structure`, `find_shortest_path`, `find_similar_nodes`, `traverse_path`	--	Structural analysis: community detection, PageRank centrality, degree distribution. Path finding between any two nodes. Semantic similarity via embeddings. Multi-hop traversal with depth and type filters.
Documents	`get_summary_context`, `get_document_status`	`add_document`, `wait_for_document`, `remove_document`	MCP-native document management. Queue files for background indexing and entity extraction. Check processing status. Wait for completion. Retrieve clustered chunks for summarization. Full cascade delete.
Extraction	`get_extraction_tasks`, `get_extraction_chunks`, `get_extraction_progress`	`submit_chunk_extraction`, `finalize_extraction`	Client-driven entity extraction. The AI assistant reads chunks, extracts entities itself, and submits results back — no server LLM required. Track progress and finalize to commit to the knowledge graph.

Read/write mode split: 19 tools are read-only and always available. 11 tools require write mode to be explicitly enabled. This is controlled by a single setting -- if you're not comfortable with an AI modifying your graph, just leave it in read mode. The AI can still search, traverse, and analyze everything.

Two transport modes: The MCP server runs in two ways depending on your setup:

stdio -- For desktop AI tools like Claude Desktop and Cursor. The CLI starts a server that communicates over standard input/output. No network involved.
Streamable HTTP -- For the Docker stack. The Cortex API exposes MCP at /api/v1/mcp using the Streamable HTTP transport, so any MCP client on the network can connect.

Both transports expose the same 30 tools with the same behavior. The only difference is how they're connected.

Try It Yourself

Setup depends on how you run Chaos Cypher. Three paths, all quick.

Path 1: CLI + Claude Desktop

If you have Chaos Cypher installed as a CLI tool, add this to your Claude Desktop configuration file (claude_desktop_config.json):

{
  "mcpServers": {
    "chaoscypher": {
      "command": "chaoscypher",
      "args": ["mcp"]
    }
  }
}

Restart Claude Desktop. You should see Chaos Cypher listed as an available MCP server in the tools panel.

Path 2: CLI + Claude Code

One command:

claude mcp add chaoscypher -- chaoscypher mcp

That's it. Claude Code will discover Chaos Cypher's tools automatically on the next session.

Path 3: CLI + Cursor

Add this to your Cursor MCP configuration (.cursor/mcp.json in your project, or the global settings):

{
  "mcpServers": {
    "chaoscypher": {
      "command": "chaoscypher",
      "args": ["mcp"],
      "transportType": "stdio"
    }
  }
}

Path 4: Docker Stack (Already Running)

If you run Chaos Cypher via docker-compose, the MCP endpoint is already live. Your Cortex API serves MCP at:

http://localhost:8080/api/v1/mcp

Any MCP client that supports the Streamable HTTP transport can connect directly. No additional configuration on the Chaos Cypher side.

Configuring Access Mode

By default, MCP runs in read-only mode. To enable write tools (creating nodes, adding documents, etc.), update your settings.yaml:

mcp:
  mode: write         # "read" (default) or "write" for full access
  auto_extract: true  # auto-extract entities from documents uploaded via MCP

Read mode exposes the 19 read tools. Write mode exposes all 30. The auto_extract flag controls whether documents uploaded via the add_document tool automatically go through entity extraction after indexing, or just get chunked and embedded for RAG search.

If you're using the CLI with a specific database, pass it as a flag:

chaoscypher mcp --database my-research

Your Data Stays Local

This is worth stating explicitly: MCP doesn't send your knowledge graph data to any external service. The protocol is a local communication channel between the AI tool running on your machine and the Chaos Cypher server running on your machine (or your network, if you use Docker). When Claude calls graphrag_search, the query goes from Claude to your local MCP server, your server searches your local database, and the results go back to Claude. Your documents, entities, and relationships never leave your infrastructure.

The AI model itself runs wherever it runs -- that's between you and your provider. But the knowledge graph data stays entirely under your control. If you pair Chaos Cypher with a local model via Ollama, the entire pipeline is air-gapped. See our local AI setup guide for the full walkthrough.

What's Next

MCP support is the foundation for a broader vision: your knowledge graph as a persistent layer that any tool in your workflow can tap into. Here's what's on the roadmap:

Prompt templates -- Pre-built MCP prompts for common patterns like "summarize this topic with citations" or "find contradictions in my sources," so you don't have to craft the right question every time.
Resource exposure -- Making graph nodes and documents available as MCP resources, so AI tools can browse your knowledge graph like a file system.
Multi-database switching -- Seamlessly switch between knowledge graphs within a single MCP session.

The flagship graphrag_search tool deserves its own explanation -- it's doing a lot more than keyword lookup. Read how GraphRAG works for the full deep-dive on the retrieval pipeline.

The MCP server ships with Chaos Cypher today. If you're already running it, you have it -- just configure your AI tool and go.

Documentation: Full MCP setup guide and tool reference in the docs
Source: The MCP implementation lives in the chaoscypher_core.mcp package.
Issues: Found a bug or have a feature request? Open an issue or start a discussion

The gap between "having a knowledge graph" and "using a knowledge graph" has always been the friction of switching contexts. MCP closes that gap. Your knowledge graph is no longer a destination you visit -- it's a capability that follows you into whatever tool you're already working in.

Automate Your Knowledge Pipeline: Triggers, Workflows, and AI Tools

Thu, 12 Mar 2026 00:00:00 GMT

Most knowledge management tools treat you like a filing clerk. Upload a document, wait for extraction, manually review entities, fix errors, tag things, connect things. Then do it all again for the next document. And the next. And the next fifty.

This is fine when you have ten documents. It falls apart at a hundred. It becomes genuinely painful at a thousand. The bottleneck is never the AI -- it's the human loop. Every document requires your attention, your judgment calls, your clicks. The extraction might take thirty seconds. Your review and cleanup take ten minutes.

Chaos Cypher's workflow engine exists to close that gap. You define a processing pipeline once -- what to extract, how to validate it, where to send notifications -- and every new document flows through it automatically. No babysitting. No repetitive clicking. You set the rules, the system follows them.

This isn't a cron job bolted onto the side. It's a proper workflow engine with event-driven triggers, conditional branching, step-to-step data passing, and composable AI tools. Let me walk you through how it works.

A Concrete Workflow: Auto-Processing Research Papers

Abstractions are boring. Let's look at a real workflow you might build: automatically processing research papers as they're uploaded.

Here's the pipeline:

Trigger: A new source file is uploaded (file.upload event fires).

Step 1 -- AI Prompt: Summarize the document in three sentences. The ai.prompt tool sends the document text to your configured LLM with instructions to produce a concise summary. If the document is long, it automatically chunks the text and processes sections in parallel, then merges the results.

Step 2 -- AI Extract JSON: Pull out structured metadata. Authors, publication date, journal name, key findings, methodology type. The ai.extract_json tool takes the document text and a JSON schema defining exactly what you want, then returns validated structured data. It retries if the extraction doesn't match the schema.

Step 3 -- Conditional: Check if this is a clinical study. The logic.conditional tool evaluates whether {{steps.step_2.methodology_type}} equals "clinical_trial". If true, the workflow branches to run additional medical-domain extraction. If false, it skips ahead.

Step 4 -- HTTP Request: Post a notification to a Slack webhook with the summary from Step 1 and the metadata from Step 2. The http.request tool sends a POST to your webhook URL with a JSON body containing {{steps.step_1.result}} and {{steps.step_2.extracted_data}}.

Notice the {{steps.step_1.result}} syntax. That's the interpolation engine at work. Every step's output is available to every subsequent step via dot-notation paths. You can reference {{inputs.document_text}} for the original trigger data, {{steps.step_2.extracted_data.authors}} for a nested field from a previous step, or even {{steps.step_3.branch_taken}} to see which conditional path was followed. The interpolation preserves types too -- if a previous step returned a number, you get a number, not the string "42".

This entire pipeline runs without human intervention. Upload a PDF, walk away, come back to a summarized, metadata-tagged, conditionally-processed document with a Slack notification waiting for you.

Under the Hood: 10 Built-In Tools

The workflow engine ships with ten built-in tools organized into five categories. Each tool has a defined input schema and output schema, so the system validates your configuration before anything runs.

Category	Tools	What They Do
AI	`ai.prompt`, `ai.extract_json`, `ai.vector_search`, `ai.generate_embedding`	LLM interactions with chunking support, structured JSON extraction with schema validation and retries, semantic search across your knowledge graph, vector embedding generation for entities
Data	`data.extract`, `data.merge`	Pull values from nested objects using dot-notation paths (`user.addresses.0.city`), merge multiple dictionaries with shallow or deep strategies
Logic	`logic.conditional`, `logic.loop`	If/then branching with safe expression evaluation, iterate over collections with configurable limits
HTTP	`http.request`	External API calls with all HTTP methods, bearer/basic auth, configurable timeouts, and SSRF protection that blocks localhost access
Templates	`templates.list`	Query your knowledge graph schema to discover available node templates

A few things worth highlighting about specific tools:

ai.prompt is smarter than a simple LLM call. It supports chunk strategies (quick and full) for documents that exceed the model's context window. When chunking is enabled, it splits the document on paragraph boundaries, processes each chunk in parallel via the LLM queue, and intelligently merges the results -- concatenating text outputs, extending arrays, and merging objects.

ai.extract_json enforces structure. You provide a JSON schema defining what you expect (say, {"entities": [{"name": "string", "type": "string"}]}), and the tool validates the LLM's output against it. If the output doesn't match, it retries automatically. This makes extraction reliable enough to run unattended.

ai.vector_search lets workflows query the knowledge graph semantically. Give it a natural language query and it performs hybrid search -- combining vector similarity with keyword fallback -- to find matching nodes. You can filter by template type and set a similarity threshold. This is how you build workflows that reason about existing knowledge: "find all entities similar to what we just extracted and check for duplicates."

http.request has built-in security. URLs are validated before any request is sent -- only http and https schemes are allowed, and direct localhost access is blocked to prevent SSRF attacks. It supports all standard HTTP methods (GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS), bearer and basic authentication, custom headers, and JSON or string request bodies. Since Chaos Cypher runs in Docker, access to other containers via their service names works fine.

How Triggers Work

Triggers are the entry point for automated workflows. They listen for events in the system and fire workflows when conditions are met.

Event sources define what happened: node.create (a new node was added to the graph), node.update (an existing node was modified), file.upload (a new source file was uploaded), import.completed (a batch import finished). The system ships with built-in triggers for auto-embedding -- every time a node is created or updated, a workflow automatically generates vector embeddings for it.

Filters let you narrow the scope. A trigger on node.create with a filter {"template_id": "person_template"} only fires when a Person node is created, not when any node is created. Filters use exact key-value matching against the event data.

Statistics tracking gives you visibility. Every trigger execution records success/failure status, execution time, and error messages. You can see your success rate, average execution time, and recent execution history -- useful for debugging workflows that occasionally fail.

The "Expose as AI Tool" Feature

Here's where things get composable. Any workflow can be exposed as a callable AI tool by setting expose_as_ai_tool: true and defining input/output schemas. Once exposed, that workflow appears alongside the built-in tools and can be used as a step in other workflows.

Think about what this enables. You build a workflow that extracts and validates medical terminology. You expose it as a tool. Now your "process research papers" workflow can call it as Step 3 instead of hardcoding medical-domain logic. You have a workflow that enriches person entities by cross-referencing external APIs? Expose it, and any other workflow can use it.

Workflows calling workflows. Each one focused on a single job, composed together into pipelines of arbitrary complexity. The step type workflow (alongside system_tool and user_tool) tells the engine to execute another workflow as a step, passing inputs and receiving outputs just like any other tool.

Workflow Portability

Workflows are portable. You can export any workflow to a version-stamped JSON file that includes the workflow definition, all its steps, and their configurations. Import it into another Chaos Cypher instance -- or share it with someone else running their own instance.

The import process is deliberate about safety. Before importing, the system validates the export version for compatibility and checks that all referenced tools exist in the target instance. It walks through every step, resolves each tool_id against the registry of system tools and user tools, and fails early if anything is missing. If a workflow references ai.prompt and http.request, those tools must be available. If a custom tool plugin is missing, the import fails with a clear error message rather than creating a broken workflow.

This design means workflows are self-describing and portable. The JSON file contains everything needed to reconstruct the workflow -- no hidden state, no implicit dependencies on database IDs. Export from your laptop, import on a server, share with a colleague. The only requirement is that the target instance has the same tools installed.

Try It Yourself

The fastest way to see the workflow engine in action is to look at the export format. Here's a minimal workflow that summarizes documents on upload:

{
  "version": "1.0",
  "workflow": {
    "name": "Summarize on Upload",
    "description": "Auto-summarize new documents when uploaded",
    "input_schema": {
      "type": "object",
      "properties": {
        "document_text": {
          "type": "string",
          "description": "The document content to summarize"
        }
      },
      "required": ["document_text"]
    },
    "output_schema": {
      "type": "object",
      "properties": {
        "summary": {
          "type": "string",
          "description": "Three-point summary of the document"
        }
      }
    }
  },
  "steps": [
    {
      "step_number": 1,
      "name": "Summarize Document",
      "tool_type": "system_tool",
      "tool_id": "ai.prompt",
      "configuration": {
        "prompt": "Summarize this document in 3 key points:\n\n{{inputs.document_text}}",
        "output_format": "text"
      }
    }
  ]
}

This is everything the system needs. The version field ensures forward compatibility. The input_schema and output_schema define the contract. The steps array contains the pipeline.

Each step specifies its tool_type (system_tool, user_tool, or workflow), a tool_id that references a registered tool, and a configuration object whose shape matches the tool's input schema. The {{inputs.document_text}} template variable gets resolved at execution time with the actual trigger data.

When importing, you have three options for handling name conflicts:

fail -- refuse to import if a workflow with the same name exists (the default, prevents accidental overwrites)
skip -- silently keep the existing workflow and skip the import
rename -- import with (imported) appended to the name

You can also import as inactive (import_as_inactive: true) to test a workflow before enabling it in production. This creates the workflow with is_active: false, letting you review the steps and do a manual test run before flipping it on.

To set up the trigger, create a trigger record with the event source (like file.upload), link it to your workflow, and optionally add filters. The trigger system runs as a background event loop -- events are queued and processed asynchronously, so trigger evaluation never blocks the main API.

For more complex workflows, the step dependency system lets you control execution order beyond simple sequential numbering. Each step can declare depends_on (a list of step IDs that must complete before it runs) and continue_on_error (proceed even if the step fails). You can also set retry_on_failure to have the engine retry a step automatically, and timeout_seconds to cap how long any individual step can run. Combined with logic.conditional for branching and logic.loop for iteration, you can express surprisingly sophisticated pipelines.

The execution model tracks everything. Each workflow run produces an execution record with status (pending, running, completed, failed, cancelled), the inputs that were provided, the outputs that were produced, timing data for each step, and -- critically -- which step failed and why if something goes wrong. This execution history is what makes workflows debuggable. When a workflow fails at 3am, you don't have to guess what happened. You look at the execution detail, see that Step 3 timed out after 120 seconds waiting for the LLM, and adjust accordingly.

What's Next

The workflow engine is designed to grow. The tool system uses a plugin architecture -- the same pattern that powers Chaos Cypher's loader plugins, domain plugins, and LLM providers. Custom tool plugins in Python are on the roadmap for users who need capabilities beyond the built-in ten. Implement a class with tool_id, input_schema, output_schema, and an execute method, drop it in the plugins directory, and it auto-registers.

More trigger event sources are coming as the platform grows. Scheduling (run a workflow every Tuesday at 9am) and webhook triggers (fire a workflow from an external system) are natural extensions of the existing event-driven architecture.

If you've built an interesting automation workflow -- whether it's a multi-step research pipeline, a quality assurance checker, or an integration with external tools -- I'd genuinely like to hear about it. The export format makes sharing straightforward: export your workflow, share the JSON, and someone else can import it and adapt it to their use case. That's the whole point of portability.

For the full API reference and detailed configuration options, check out the workflow documentation. The built-in system workflows (like auto-embedding on node create/update) are also good starting points -- export them and study the step configurations to see how the engine's own automation is wired together.

Chaos Cypher Blog

Extract Smarter: How Domain-Aware AI Builds Better Knowledge Graphs

The Problem with Generic Extraction​

How It Works: Upload to Knowledge Graph​

Under the Hood: Domain Detection and Extraction Quality​

How Detection Works​

How Domains Shape Extraction Quality​

Try It Yourself​

What's Next​

Why Your RAG Chat is Missing Half the Answers (And How GraphRAG Fixes It)

What Happens When You Ask a Multi-Hop Question​

Under the Hood (Technical Deep-Dive)​

Personalized PageRank​

Reciprocal Rank Fusion​

Graceful Degradation​

Tunable Parameters​

Try It Yourself​

What's Next​

Build a Private AI Knowledge Graph That Never Leaves Your Machine

From Zero to Local Knowledge Graph​

Pick Your Preset​

Under the Hood​

Embeddings Are Always Local​

Multi-Instance Load Balancing​

Thinking Mode​

Performance Reality Check​

Four Providers, One Interface​

Try It Yourself​

What's Next​

Give Any AI Assistant Direct Access to Your Knowledge Graph with MCP

What Is MCP, and Why Should You Care?​

What This Actually Looks Like​

Scenario 1: Research -- Connecting the Dots​

Scenario 2: Coding -- Your Project's Knowledge Base in Your Editor​

Scenario 3: Writing -- Summarize With Citations​

Under the Hood: 30 Tools, 7 Categories​

Try It Yourself​

Path 1: CLI + Claude Desktop​

Path 2: CLI + Claude Code​

Path 3: CLI + Cursor​

Path 4: Docker Stack (Already Running)​

Configuring Access Mode​

Your Data Stays Local​

What's Next​

Automate Your Knowledge Pipeline: Triggers, Workflows, and AI Tools

A Concrete Workflow: Auto-Processing Research Papers​

Under the Hood: 10 Built-In Tools​

How Triggers Work​

The "Expose as AI Tool" Feature​

Workflow Portability​

Try It Yourself​

What's Next​

The Problem with Generic Extraction

How It Works: Upload to Knowledge Graph

Under the Hood: Domain Detection and Extraction Quality

How Detection Works

How Domains Shape Extraction Quality

Try It Yourself

What's Next

What Happens When You Ask a Multi-Hop Question

Under the Hood (Technical Deep-Dive)

Personalized PageRank

Reciprocal Rank Fusion

Graceful Degradation

Tunable Parameters

Try It Yourself

What's Next

From Zero to Local Knowledge Graph

Pick Your Preset

Under the Hood

Embeddings Are Always Local

Multi-Instance Load Balancing

Thinking Mode

Performance Reality Check

Four Providers, One Interface

Try It Yourself

What's Next

What Is MCP, and Why Should You Care?

What This Actually Looks Like

Scenario 1: Research -- Connecting the Dots

Scenario 2: Coding -- Your Project's Knowledge Base in Your Editor

Scenario 3: Writing -- Summarize With Citations

Under the Hood: 30 Tools, 7 Categories

Try It Yourself

Path 1: CLI + Claude Desktop

Path 2: CLI + Claude Code

Path 3: CLI + Cursor

Path 4: Docker Stack (Already Running)

Configuring Access Mode

Your Data Stays Local

What's Next

A Concrete Workflow: Auto-Processing Research Papers

Under the Hood: 10 Built-In Tools

How Triggers Work

The "Expose as AI Tool" Feature

Workflow Portability

Try It Yourself

What's Next