Skip to main content

Benchmark Model Cards

Generated from models_registry.yaml. Do not edit by hand — run uv run python scripts/generate_model_cards.py.

ModelProviderOpen weightContextPrice in/out ($/1M)Why included
Gemma 4 12B (local)ollamayes-free (local)Google Gemma 4 12B; strong instruction-following at 7.6 GB.
Gemma 4 26B (local)ollamayes-free (local)Google Gemma 4 26B; best quality in the Gemma family that fits ≤24 GB.
GLM4 9B (local)ollamayes-free (local)GLM4 9B; compact Chinese-lineage model, good entity coverage.
GPT-OSS 120B (workstation)ollamayes-free (local)OpenAI OSS 120B; maximum-scale open-weight extractor for workstation benchmarking.
GPT-OSS 20B (local)ollamayes-free (local)OpenAI OSS 20B; mid-tier open-weight baseline at 13 GB.
Llama 3.1 70B (workstation)ollamayes-free (local)Meta Llama 3.1 70B; large-iron open-weight frontier baseline.
Qwen3 Embedding 4B (local)ollamayes-free (local)Qwen3 embedding 4B; lightweight embedder for memory-constrained setups.
Qwen3 Embedding 8B (local)ollamayes-free (local)Qwen3 embedding 8B; high-quality dense retrieval at 4.7 GB.
Qwen3.6 35B-A3B MoE (local)ollamayes-free (local)Qwen3.6 35B MoE; near-frontier quality at 23 GB via sparse activation.
Qwen3 14B (local)ollamayes-free (local)Qwen3 14B; quality step-up from 8B while staying ≤10 GB.
Qwen3 8B (local)ollamayes-free (local)Qwen3 8B; efficient general-purpose local extractor at 5.2 GB.
Claude Haiku 4.5anthropicno200,000$1.00 / $5.00Fastest, cheapest Anthropic tier.
Claude Opus 4.8anthropicno1,000,000$5.00 / $25.00Frontier reasoning baseline; most capable Anthropic model.
Claude Sonnet 4.6anthropicno1,000,000$3.00 / $15.00Best speed/intelligence balance from Anthropic.
Gemini 2.5 Flashgeminino1,000,000$0.15 / $0.60Cheap, fast Google tier to complement Haiku/GPT-4o-Mini in the small-model slot.
Gemini 2.5 Progeminino1,000,000$1.25 / $10.00Google frontier baseline; 1M context matches Anthropic/OpenAI frontier tier.
GPT-4oopenaino128,000$2.50 / $10.00OpenAI frontier baseline.
GPT-4o Miniopenaino128,000$0.15 / $0.60Cheap OpenAI tier for high-volume extraction.