Skip to main content

Plugin System

Chaos Cypher uses a unified plugin architecture for extensibility. All plugin registries extend BaseRegistry[T] with auto-discovery from designated directories.

Plugin Types

TypeLocationPatternCountPurpose
Loadersservices/sources/loaders/*_loader.py8Parse different file formats
Tool Pluginsservices/workflows/tools/plugins/*_plugin.py10Workflow step implementations
Domain Pluginsservices/sources/engine/extraction/domains/plugins/*.jsonld16Extraction domain configurations
LLM Providersadapters/llm/providers/*_provider.py4LLM backend implementations
Cleanersservices/sources/normalizer/cleaners/*_cleaner.pyContent normalization rules
Archive Handlersservices/sources/loaders/archive/handlers/*_handler.pyArchive format detection
Presetsservices/presets/plugins/*.json7Ollama VRAM configuration presets

Loaders

Document loaders parse different file formats into text for indexing and extraction.

LoaderFormats
PDF.pdf
Text.txt, .md, .log
CSV.csv
JSON.json, .jsonl
Image.jpg, .png, .gif, .webp, .tiff, .bmp (OCR)
Audio.mp3, .wav, .m4a, .flac, .ogg (transcription)
Video.mp4, .mkv, .avi, .mov, .webm (audio extraction + transcription)
Archive.zip, .tar.gz (auto-detect format: Sphinx, Markdown, OpenAPI, mixed)

User Plugins

Place custom loaders in data/plugins/loaders/ to add support for new file formats. User plugins override built-in plugins with the same ID.

Tool Plugins

Workflow step implementations for the automation system.

PluginDescription
AI Extract JSONExtract structured data using LLM
AI Generate EmbeddingGenerate vector embeddings
AI PromptCustom LLM prompt execution
AI Vector SearchSemantic similarity search
Data ExtractExtract data from structured sources
Data MergeMerge multiple data sources
HTTP RequestMake HTTP requests
Logic ConditionalBranch workflow based on conditions
Logic LoopIterate over collections
Templates ListList available templates

User Plugins

Place custom tool plugins in data/plugins/tools/.

Domain Plugins

Extraction domains configure how entities and relationships are extracted from different types of content. Each domain is a .jsonld file — no Python code required.

Built-in domains: biographical, cybersecurity, educational, financial, generic, historical, investigation, legal, literary, medical, news, philosophical, political, scientific, technical, theological.

Each domain defines: entity types, relationship types, detection rules, LLM guidance, quality scoring, extraction limits, and deduplication behavior.

User Plugins

Place custom domain configs in data/plugins/domains/. User domains override built-in domains with the same name.

Full domain schema reference and examples

LLM Providers

ProviderFeatures
OllamaLocal inference, multi-instance, load balancing
OpenAIGPT models, cloud inference
AnthropicClaude models
GeminiGoogle AI models

Embeddings are handled separately by a dedicated embedding provider (LocalEmbeddingProvider by default, running sentence-transformers on the CPU), not by LLM providers.

Registry Pattern

All registries extend BaseRegistry[T]:

class LoaderRegistry(BaseRegistry[BaseLoader]):
"""Auto-discovers and registers document loaders."""

def discover(self) -> None:
# Scans plugin directories for matching files
# Loads and registers each plugin
pass

Key behaviors:

  • Auto-discovery scans designated directories for files matching the pattern
  • User plugins (in data/plugins/) override built-in plugins with the same ID
  • Python plugins implement a metadata property and core methods
  • Config plugins (.jsonld) are loaded as data files
  • Registration is idempotent — re-registering replaces the previous plugin

VRAM Presets

Pre-configured Ollama model selections optimized for different GPU memory sizes:

PresetVRAMTypical Models
16GB16 GBSmaller quantized models
20GB20 GBMedium models
24GB24 GBStandard models
32GB32 GBLarger models
48GB48 GBFull-size models
96GB96 GBMultiple large models
128GB128 GBMaximum capability

See also