Goodfire launched Silico, a mechanistic interpretability platform that uses AI agents to map and modify LLM internals — hallucinations, behavior tuning, and model auditing included.
AI startup Goodfire launched Silico, a developer-facing platform built on mechanistic interpretability techniques — the practice of reverse-engineering how neural networks actually process information. The tool automates interpretability work using AI agents, enabling users to audit trained models and influence behavior during training. Goodfire claims it has already used these methods to reduce hallucinations in LLMs. MIT Technology Review named mechanistic interpretability one of its 10 Breakthrough Technologies of 2026, lending the space additional credibility.
Mechanistic interpretability has been a black-box research discipline inaccessible to most engineers — Silico wraps it in an agent-driven interface. This means developers can now probe which internal 'features' are firing when a model hallucinates, refuses a prompt, or drifts in tone — without hand-crafting sparse autoencoders. The automation of interpretability work via agents is the real unlock: it removes the bottleneck of requiring specialized researchers for every diagnostic task.
Sign up for Silico's early access and run a hallucination audit on any open-weight model you're currently fine-tuning — compare the internal feature activations before and after your RLHF pass to confirm your training is actually suppressing the right circuits.
Go to goodfire.ai and request early access to Silico
Tags
Also today
Signals by role
Also today
Tools mentioned