Skip to main content

Core Concepts

This page explains the architecture of chaoscypher-core and the design rules you need to follow when working with it.

Hexagonal architecture

The core library uses hexagonal architecture (ports and adapters) to stay framework-agnostic. Business logic never depends on a specific database, web framework, or LLM provider.

The flow is always: Services depend on Ports (protocols). Adapters implement Ports. Services never import adapter code directly.

Ports (protocols)

Ports are Python Protocol classes that define contracts for data access. They live in chaoscypher_core.ports:

Graph and search:

ProtocolFilePurpose
GraphRepositoryProtocolports/graph.pyNode, edge, and template CRUD
SearchRepositoryProtocolports/search.pyKeyword, vector, and hybrid search
SearchRetryQueueProtocolports/search_retry.pyRetry queue for failed search index operations

Document processing:

ProtocolFilePurpose
ChunkingProtocolports/chunk.pyHierarchical document chunking
IndexingProtocolports/index.pyDocument chunk embedding storage
StructuredExtractorPortports/structured_extraction.pyLLM-backed structured extraction
EmbeddingProviderProtocolports/embedding.pyEmbedding model provider

Per-feature storage protocols (one file per domain):

ProtocolFilePurpose
WorkflowStorageProtocolports/storage_workflows.pyWorkflow definitions and steps
WorkflowExecutionStorageProtocolports/storage_workflow_executions.pyWorkflow execution tracking
SourceStorageProtocolports/storage_sources.pySource documents and processing lifecycle
ChatStorageProtocolports/storage_chats.pyChat conversations and messages
TriggerStorageProtocolports/storage_triggers.pyEvent triggers
ToolStorageProtocolports/storage_tools.pyTool registry (system + user tools)
LLMMetricsStorageProtocolports/storage_llm_metrics.pyLLM call cost and token tracking
ChunkStorageProtocolports/storage_chunks.pyDocument chunk persistence
CitationStorageProtocolports/storage_citations.pyCitation tracking
EntityEmbeddingStorageProtocolports/storage_embeddings.pyEntity embedding vectors
SourceTagStorageProtocolports/storage_source_tags.pySource tag management
ExtractionQueueStorageProtocolports/storage_extraction_queue.pyExtraction job queue state
ExtractionSubmissionStorageProtocolports/storage_extraction_submissions.pyExtraction submission tracking
GraphSnapshotStorageProtocolports/storage_graph_snapshot.pyGraph snapshot staleness and breakdown queries

Infrastructure:

ProtocolFilePurpose
DatabaseProtocolports/db.pyDatabase metadata
LLMProviderPortports/llm.pyLLM provider interface
RetryPolicyPortports/retry.pyRetry policy configuration
TransactionalAdapterProtocolports/transactional.pyUnit-of-work transaction support
SourceRecoveryPortsports/source_recovery.pySource recovery operations

All protocols use structural typing -- any class with matching method signatures satisfies the protocol. No inheritance required.

Services

Services contain business logic and depend only on protocols. They live in chaoscypher_core.services and are organized by domain:

DomainKey ServicesDescription
Graph managementNodeService, EdgeService, TemplateService, SourceServiceCRUD with validation and search indexing
SearchSearchService, IndexingServiceRAG retrieval and embedding generation
SourcesSourceProcessingService, ExtractionService, SourceCommitServiceDocument ingestion pipeline
WorkflowsWorkflowService, WorkflowExecutor, ToolServiceWorkflow definitions and LangGraph execution
QualityQualityScorerEntity and relationship quality scoring
Internal services

Additional services like GraphAnalyticsService, CountsService, ChatService, ChatExecutor, and ResearchAgent exist in the codebase but are internal implementation details not exported from the public API. Access their functionality through Engine convenience methods instead.

Adapters

Adapters implement protocols with concrete technology. The core library ships with:

  • SqliteAdapter -- implements all storage protocols (WorkflowStorageProtocol, SourceStorageProtocol, ChatStorageProtocol, etc.) using SQLite + SQLModel
  • GraphRepository -- implements GraphRepositoryProtocol using SQLite
  • SearchRepository -- implements SearchRepositoryProtocol using sqlite-vec vector index
  • LLM providers -- Ollama, OpenAI, Anthropic, and Gemini adapters via ProviderFactory

The dict-not-entities rule

Critical: Storage protocols return dicts, not ORM entities

This is the most important rule when working with the core library. Violating it causes AttributeError at runtime.

All storage protocol methods return dict[str, Any] (or list[dict[str, Any]]). This keeps the core library portable across storage backends.

# WRONG -- storage returns a dict, not an ORM entity
workflow = adapter.get_workflow(workflow_id)
name = workflow.name # AttributeError!

# CORRECT -- use dict access
workflow = adapter.get_workflow(workflow_id)
name = workflow["name"] # Works
name = workflow.get("name") # Safe (returns None if missing)

The only places where ORM entity attribute access is valid are:

  1. Inside the SQLite adapter mixins (before conversion to dict)
  2. GraphRepository return values -- GraphRepositoryProtocol returns Pydantic model objects (Node, Edge, Template), not dicts
  3. Database model definitions

Everywhere else, use dict access patterns.

Framework-agnostic design

The core library has no dependency on FastAPI, Valkey, or any web framework. This means:

  • It can be used in CLI tools, Jupyter notebooks, or background scripts
  • No async runtime is required for basic operations (graph CRUD, search)
  • Services accept plain Python types (dicts, strings, Pydantic models)
  • Configuration is pure Pydantic (EngineSettings), not tied to environment variable loaders
Async operations

Some methods on GraphRepositoryProtocol are async (e.g., create_nodes_batch, create_edges_batch, create_templates_batch). These are used during bulk operations like source commits. Standard CRUD methods (create_node, list_nodes, etc.) are synchronous.

EngineSettings

EngineSettings is a Pydantic BaseModel that configures the entire engine. It uses nested settings groups:

from chaoscypher_core import EngineSettings

settings = EngineSettings(current_database="mydb")
GroupClassWhat it configures
pathsPathSettingsData and config directories (XDG-compliant)
llmLLMSettingsProvider selection, API keys, model names, token limits
batchingBatchingSettingsEmbedding batch sizes, graph analysis limits
chunkingChunkingSettingsChunk sizes, overlap, grouping for extraction
extractionExtractionSettingsLLM retry backoff, quality thresholds, loop detection
source_processingSourceProcessingSettingsDeduplication, web scraping, analysis depth
normalizerNormalizerSettingsContent cleaning (encoding, OCR, markdown)
searchSearchSettingsVector dimensions, re-ranking, result limits
databaseDatabaseSettingsSQLite connection timeouts and retry config
paginationPaginationSettingsDefault and max page sizes
graphGraphSettingsDefault templates, relationship types, export limits
archiveArchiveSettingsArchive extraction limits and format detection
chatChatSettingsTool-calling iteration limits

When constructing EngineSettings directly, current_database is the only required field. However, when using Engine("./data/databases/mydb"), the database name is auto-inferred from the directory name, so you don't need to set it explicitly. All other settings groups have sensible defaults.

Settings in the full stack

When using Chaos Cypher as a full platform (Cortex + Neuron + Docker), settings are loaded from settings.yaml and converted to EngineSettings via a bridge. When using chaoscypher-core standalone, you construct EngineSettings directly.

The Engine class

The Engine class wires up all services with proper dependency injection:

from chaoscypher_core import Engine

with Engine("./data/databases/mydb") as engine:
engine.node_service # NodeService instance
engine.edge_service # EdgeService instance
engine.template_service # TemplateService instance
engine.workflow_service # WorkflowService instance
engine.chunking_service # ChunkingService instance
engine.indexing_service # IndexingService instance
engine.search_service # SearchService instance
engine.llm_provider # LLMProvider instance (lazy)
engine.extraction_service # ExtractionService instance (lazy)
engine.commit_service # SourceCommitService instance (lazy)
engine.graph_repository # GraphRepository instance
engine.search_repository # SearchRepository instance
engine.storage_adapter # SqliteAdapter instance
engine.settings # EngineSettings instance

Engine supports the context manager protocol -- calling engine.close() (or exiting the with block) disconnects the storage adapter and releases database locks.

All public Engine convenience methods (e.g., create_node, get_stats, process_document, add_document, search) return Pydantic models with attribute access. Underlying service methods still return dicts -- the Engine wraps them for a cleaner API.

Next steps