Skip to main content

Quick Start

This guide gets you productive with chaoscypher-core in minutes. It covers the most common tasks -- chatting with an LLM, extracting entities, building a knowledge graph, and processing documents -- using the simplest API surface first.

One-Liner LLM Chat

The ChaosCypher facade gives you zero-boilerplate access to LLM chat, embeddings, extraction, and search. No database, no setup.

from chaoscypher_core import ChaosCypher

response = ChaosCypher.chat_sync("What is a knowledge graph?")
print(response.content)

That's it. Defaults to Ollama on localhost:11434. To switch providers:

ChaosCypher.configure(provider="openai", api_key="sk-...")
response = ChaosCypher.chat_sync("What is a knowledge graph?")
Provider auto-detection

If you skip the provider argument, Chaos Cypher detects it from the API key prefix: sk-ant- maps to Anthropic, sk- to OpenAI. You can also set environment variables (CHAOSCYPHER_LLM_PROVIDER, OPENAI_API_KEY, etc.) and skip configure() entirely.

Extract Entities from a Document

Run LLM-powered entity extraction on any file -- no database, no graph setup.

from chaoscypher_core import ChaosCypher

result = ChaosCypher.extract_sync("paper.pdf")
print(result.model_dump_json(indent=2))

ChaosCypher auto-detects the file type, chunks the text, and extracts entities and relationships. The result is an ExtractionResult model with attribute access and .model_dump_json() for JSON output.

To extract from raw text instead of a file, use the text= keyword argument to skip file detection:

result = ChaosCypher.extract_sync(text="Albert Einstein developed the theory of relativity...")
print(result.entities)
Chunking without extraction

Inspect intermediate chunking results before extraction:

chunks = ChaosCypher.chunk_sync("paper.pdf")
print(f"{chunks.total_small_chunks} chunks in {chunks.total_groups} groups")

# Or from raw text
chunks = ChaosCypher.chunk_sync(text="Long document content here...")

Or just load file text directly (no LLM needed):

text = ChaosCypher.load("paper.pdf")

Generate Embeddings

Single text or batch -- both sync and async:

from chaoscypher_core import ChaosCypher

# Single embedding
result = ChaosCypher.embed_sync("quantum entanglement")
print(f"Dimensions: {len(result.embedding)}")

# Batch embedding (always returns BatchEmbedResult)
batch = ChaosCypher.embed_batch_sync(["text one", "text two", "text three"])
print(f"{len(batch.embeddings)} embeddings generated")

embed() accepts a string and returns EmbedResult. embed_batch() accepts a list and always returns BatchEmbedResult -- no runtime type checking needed.

Build a Knowledge Graph

For persistent storage with a full graph database, use Engine. It wires up SQLite storage, repositories, and services in one call.

Pass provider= and api_key= directly to Engine -- no need to construct settings objects:

from chaoscypher_core import Engine

with Engine(database="demo", provider="openai", api_key="sk-...") as engine:
alice = engine.add_node("Person", "Alice", properties={"role": "Engineer"})
bob = engine.add_node("Person", "Bob", properties={"role": "Designer"})
engine.add_edge("knows", alice, bob)

stats = engine.get_stats()
print(f"Graph: {stats.nodes} nodes, {stats.edges} edges")

add_node and add_edge automatically create templates if they don't exist. The database and tables are created automatically on first use.

Engine also inherits ChaosCypher.configure() settings, so you can configure once at program start:

ChaosCypher.configure(provider="openai", api_key="sk-...")

# All Engine instances inherit the configured provider
with Engine(database="demo") as engine:
alice = engine.add_node("Person", "Alice")

Query the Graph

# List all nodes (returns PaginatedResult with .data and .total)
result = engine.list_nodes()
for node in result.data:
print(f"{node.label} (template: {node.template_id})")

# Get database statistics
stats = engine.get_stats()
print(f"Nodes: {stats.nodes}, Edges: {stats.edges}")

Search (Sync and Async)

Engine provides both sync and async search:

from chaoscypher_core import Engine

# Sync (scripts, notebooks)
with Engine(database="demo") as engine:
results = engine.search_sync("quantum entanglement")
for r in results:
print(f"{r.label} ({r.score:.2f})")
# Async
import asyncio

async def main():
async with Engine(database="demo") as engine:
results = await engine.search("quantum entanglement", mode="semantic")
for r in results:
print(f"{r.label} ({r.score:.2f})")

asyncio.run(main())

Search supports three modes: "hybrid" (default), "semantic", and "keyword".

Chat and Embed through Engine

Engine exposes LLM methods that use the engine's configured provider:

# Sync
with Engine(database="demo", provider="openai", api_key="sk-...") as engine:
response = engine.chat_sync("Summarize this graph")
print(response.content)

embedding = engine.embed_sync("quantum entanglement")
print(f"Dimensions: {len(embedding.embedding)}")

batch = engine.batch_embed_sync(["text one", "text two"])
print(f"{len(batch.embeddings)} embeddings")
# Async
async with Engine(database="demo") as engine:
response = await engine.chat("Summarize this graph")
embedding = await engine.embed("quantum entanglement")
batch = await engine.batch_embed(["text one", "text two"])

Clean Up

Always close the engine to release database connections. The recommended pattern is the context manager (with Engine(...)) shown above.


Process Documents End-to-End

Single Document (Facade)

The simplest path from a file on disk to a fully populated knowledge graph:

from chaoscypher_core import ChaosCypher

# Sync
result = ChaosCypher.add_document_sync("paper.pdf", database="demo")
print(f"Created {len(result.nodes)} nodes, {len(result.edges)} edges")
# Async
result = await ChaosCypher.add_document("paper.pdf", database="demo")

This handles loading, chunking, indexing, extraction, and commit in a single call.

Batch Documents (Facade)

Process multiple files at once with add_documents:

from chaoscypher_core import ChaosCypher

# Sync -- glob pattern or explicit list
results = ChaosCypher.add_documents_sync("papers/*.pdf", database="demo")
print(f"Processed {len(results)} documents")

# Or pass a list
results = ChaosCypher.add_documents_sync(
["doc1.pdf", "doc2.pdf", "notes.txt"],
database="demo",
)
# Async
results = await ChaosCypher.add_documents(["doc1.pdf", "doc2.pdf"], database="demo")

Single Document (Engine)

For repeated operations or more control, use Engine directly:

from chaoscypher_core import Engine

# Sync
with Engine(database="demo") as engine:
result = engine.add_document_sync("paper.pdf")
print(f"Created {len(result.nodes)} nodes, {len(result.edges)} edges")
# Async
async with Engine(database="demo") as engine:
result = await engine.add_document("paper.pdf")

Batch Documents (Engine)

# Sync
with Engine(database="demo") as engine:
results = engine.add_documents_sync(["doc1.pdf", "doc2.pdf"])
print(f"Processed {len(results)} documents")
# Async
async with Engine(database="demo") as engine:
results = await engine.add_documents("papers/*.pdf")

Process Text Already in Memory

If you have text from a web scrape or user input (no file on disk), use process_document:

# Sync
with Engine(database="demo") as engine:
result = engine.process_document_sync(text, filename="scraped_article.txt")
print(f"Created {len(result.nodes)} nodes")
# Async
async with Engine(database="demo") as engine:
result = await engine.process_document(text, filename="scraped_article.txt")

Tracking Progress

Use the on_progress callback to monitor long-running operations:

from chaoscypher_core import ChaosCypher

def on_progress(stage, result):
"""Called after each pipeline stage completes."""
print(f"Completed: {stage}") # "chunking", "indexing", "extraction"

result = ChaosCypher.add_document_sync("paper.pdf", on_progress=on_progress)
print(f"Nodes created: {len(result.nodes)}")

Error Handling

The SDK raises specific exceptions for different failure modes:

from chaoscypher_core import Engine, NotFoundError, ValidationError, OperationError

with Engine(database="demo") as engine:
try:
node = engine.get_node("nonexistent-id")
except NotFoundError:
print("Node does not exist")
except ValidationError as e:
print(f"Invalid input: {e}")
except OperationError as e:
print(f"Operation failed: {e}")
ExceptionWhen raised
NotFoundErrorEntity (node, edge, template, source) not found by ID
ValidationErrorInvalid input data (bad types, missing required fields)
OperationErrorOperation failed (LLM unavailable, storage error)
ConflictErrorDuplicate entity or constraint violation

Configuration

Quick: configure() or Kwargs

The simplest way to set up a provider -- works for both the ChaosCypher facade and Engine:

from chaoscypher_core import ChaosCypher, Engine

# Option 1: Global configure (affects all subsequent calls)
ChaosCypher.configure(provider="openai", api_key="sk-...")
result = ChaosCypher.extract_sync("paper.pdf") # uses OpenAI

# Option 2: Per-engine inline kwargs
with Engine(database="demo", provider="anthropic", api_key="sk-ant-...") as engine:
response = engine.chat_sync("Hello")

You can also configure embedding models and chunking parameters:

ChaosCypher.configure(
provider="openai",
api_key="sk-...",
embedding_model="BAAI/bge-large-en-v1.5",
chunk_size=512,
)

Environment Variables (Zero Code)

export CHAOSCYPHER_LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...

Advanced: EngineSettings for Full Control

For complete control over all settings, pass an EngineSettings instance:

from chaoscypher_core import Engine, EngineSettings, LLMSettings

settings = EngineSettings(
current_database="mydb",
llm=LLMSettings(
chat_provider="openai",
openai_api_key="sk-...",
openai_chat_model="gpt-4.1",
),
)

engine = Engine(database="mydb", settings=settings)

These same settings work for the standalone extraction pipeline -- just pass settings to ChunkingService(settings) instead of using the defaults.


Advanced: Explicit Template Control

When you need to define property schemas, descriptions, or other template metadata, use the explicit create_template / create_node / create_edge methods instead of add_node / add_edge:

from chaoscypher_core import Engine, TemplateCreate, NodeCreate, EdgeCreate

with Engine(database="demo") as engine:
person = engine.create_template(
TemplateCreate(name="Person", template_type="node")
)
alice = engine.create_node(
NodeCreate(template_id=person.id, label="Alice")
)
bob = engine.create_node(
NodeCreate(template_id=person.id, label="Bob")
)

knows = engine.create_template(
TemplateCreate(name="knows", template_type="edge")
)
engine.create_edge(
EdgeCreate(
template_id=knows.id,
source_node_id=alice.id,
target_node_id=bob.id,
label="knows",
)
)
Two-step chunking (inspect before persisting)

If you need to inspect chunks before storing them (e.g., for debugging or custom filtering), access the chunking service directly:

from chaoscypher_core import Engine

async with Engine(database="mydb") as engine:
# Step 1: Chunk the text (not yet persisted)
chunks = await engine.chunking_service.create_chunks(text)
print(f"{chunks.total_small_chunks} chunks in {chunks.total_groups} groups")

# Step 2: Persist after inspection
engine.chunking_service.store_chunks(chunks)

For most use cases, engine.chunk_document(text) or engine.add_document("file.pdf") handles both steps automatically.


API Reference

Chaos Cypher Facade

All methods are static -- no instantiation needed.

MethodSync variantDescription
configure()--Set global provider/API key
reset()--Clear cached settings
extract()extract_sync()Extract entities from file or text=
chat()chat_sync()LLM chat
embed()embed_sync()Single embedding
embed_batch()embed_batch_sync()Batch embeddings (always BatchEmbedResult)
search()search_sync()Search a knowledge graph database
chunk()chunk_sync()Chunk file or text= for RAG
add_document()add_document_sync()Full file-to-graph pipeline
add_documents()add_documents_sync()Batch file-to-graph pipeline
load()--Load file text (sync, no LLM)

Engine

Accepts provider=, api_key=, and all configure() aliases as constructor kwargs. Inherits ChaosCypher.configure() settings when no explicit config is given.

MethodSync variantDescription
add_node()--Create node (auto-creates template)
add_edge()--Create edge (auto-creates template)
search()search_sync()Hybrid/semantic/keyword search
chat()chat_sync()LLM chat through engine provider
embed()embed_sync()Single embedding
batch_embed()batch_embed_sync()Batch embeddings
add_document()add_document_sync()Full file-to-graph pipeline
add_documents()add_documents_sync()Batch file-to-graph pipeline
process_document()process_document_sync()Process text (no file) to graph
chunk_document()--Chunk and store text
commit()--Extract + commit stored chunks
index_source()--Generate embeddings for chunks
get_stats()--Database statistics
check_health()--LLM provider health check

Next Steps