We Gave AI a Mirror.
Now It Measures What It Believes.
Git-native coordination layer for AI agents. Track epistemic state, catch capability drift, and enable seamless multi-agent workflowsβall stored in your repository.
Making Git Sexy
Git becomes efficient external memory for AI agents. Through epistemic measurement, your AI automatically maps every decision, uncertainty, and capability shift to git notesβturning your repository into a distributed cognitive state machine. No manual tracking, no context loss, just seamless version-controlled reasoning.
What Is Empirica?
Core capabilities that make AI agents smarter, more reliable, and easier to coordinate.
Epistemic Transactions
Every task follows a measured cycle. PREFLIGHT captures what the AI believes before starting. During investigation, noetic artifacts are logged β findings, unknowns, dead-ends, assumptions β building a map of what's known and what isn't. The Sentinel CHECK gate validates readiness before transitioning to action, where praxic artifacts track goals, decisions, and commits. POSTFLIGHT measures what was actually learned.
Context Continuity
Resume work without repetition. Each session builds on the last β preserving reasoning state, not just file diffs. Epistemic handoff reports carry what matters forward; compaction hooks recover context automatically.
Git-Native Coordination
Your repository is the source of truth. Epistemic checkpoints stored in git notesβdistributed, version-controlled, queryable. Multi-agent coordination without central servers.
Dynamic Context Loader
Load exactly what you need, when you need it. Project bootstrap scales context depth based on uncertainty β from minimal to deep. Cross-session, cross-client, cross-provider knowledge transfer.
Sentinel Gate
Investigate first, act second β enforced by the system, not willpower. The Sentinel blocks destructive actions (file edits, code writes) until the AI has demonstrated sufficient understanding through a CHECK gate. No more "ready, fire, aim."
Epistemic Awareness through Humility
AI that knows what it knows β and what it doesn't. Through measured self-assessment calibrated against real outcomes, agents develop calibrated epistemic awareness grounded in humility: honest uncertainty is more valuable than false confidence.
Grounded Calibration
Self-assessment alone isn't enough. After every transaction, the AI's beliefs about its own performance are compared against objective evidence: did the tests pass? How many files changed? Were the goals actually completed? The gap between belief and reality is the calibration signal.
Epistemic Memory
Not just storage β a cognitive immune system. Findings act as antigens; lessons act as antibodies. New discoveries challenge existing beliefs, reducing confidence in outdated knowledge. Memory decays naturally, keeping what matters fresh.
Natural Language Workflow
No new syntax to learn. Empirica integrates through hooks, skills, and MCP tools into your existing AI workflow. Describe what you're doing in natural language β the framework handles measurement, gating, and artifact logging behind the scenes.
Why Empirica?
The Problem
- β AI agents forget everything between sessions and across compaction
- β AI acts before it understands β edits code it hasn't read, commits changes it can't verify
- β No way to measure whether AI reasoning is actually improving or drifting
- β AI decisions leave no audit trail β no way to trace what was known, assumed, or wrong
- β Overconfidence goes unchecked β AI says "done" when tests fail and goals aren't met
With Empirica
- β Epistemic state persisted in git, SQLite, and Qdrant β survives sessions and compaction
- β Sentinel gate enforces investigation before action β no more "ready, fire, aim"
- β Grounded calibration compares AI self-assessment against real outcomes β tests, git, goals
- β Every finding, assumption, dead-end, and decision logged β full epistemic provenance
- β Dual-track measurement catches the gap between belief and reality before it compounds
How It Works
Simple workflow, powerful results. The CASCADE cycle tracks epistemic state throughout any task.
PREFLIGHT
Before starting work, AI assesses what it knows, what it can do, and how uncertain it is. Creates baseline for comparison.
Noetic β CHECK β Praxic
Investigate first, then act. The noetic phase builds understanding (findings, unknowns, dead-ends). The Sentinel CHECK gate validates readiness before transitioning to praxic action (edits, commits, implementations).
POSTFLIGHT
After completing work, AI re-assesses epistemic state. Measures actual learning via PREFLIGHTβPOSTFLIGHT deltas. Triggers grounded verification against objective evidence (tests, git, goals).
Who Is Empirica For?
Empirica is AI-nativeβbuilt for AI agents that need epistemic calibration. Any domain requiring accurate context, uncertainty tracking, and reliable reasoning benefits from agents that know what they know.
Software Engineering
- β Code agents preventing capability drift
- β Multi-agent teams with context continuity
- β Production AI with real-time self-monitoring
Healthcare & Legal
- β Medical diagnosis with evidence quality tracking
- β Legal research assessing source reliability
- β Misinformation detection in records
Finance & Analysis
- β Trading agents with uncertainty-aware decisions
- β Risk assessment with confidence bounds
- β Financial report analysis with source validation
Communication & Knowledge
- β Meeting assistants detecting misunderstandings
- β Research agents tracking contradictory sources
- β News verification with epistemic scoring
Multi-Project, Multi-AI Workflows
Run multiple AI agents across multiple projects simultaneously β each with isolated epistemic state.
Instance Isolation
Each tmux pane gets its own isolated instance, identified by TMUX_PANE.
Run Claude Code in three panes working on three different projects β or the same project with different AI models β
each with its own sessions, transactions, and epistemic state. No cross-contamination, seamless switching.
tmux pane %2 β empirica-web (claude-code)
tmux pane %3 β client-project (qwen-testing)
Works with tmux, screen, or any terminal multiplexer. Falls back to TTY identification without tmux.
Automatic Context Switching
Projects auto-detected from git remote URLs. Each project gets its own SQLite database, findings, goals, and calibration history. Switch projects within a pane and your full epistemic state travels with you β or run multiple projects simultaneously across panes.
β Session reattached, 12 findings loaded
Multi-AI Identity
Different AI models tracked under different identities. Each AI has its own calibration history, bias corrections, and learning trajectory. Handoffs between agents carry full epistemic context.
--ai-id qwen-testing # test specialist
--ai-id gemini-review # code review
Session Recovery
Crashes, restarts, and compaction don't lose state. Transactions can be adopted by new instances, sessions resumed across terminal restarts, and epistemic state recovered from git checkpoints.
β Transaction resumed, context intact
Platform note: Full plugin integration (hooks, Sentinel, skills) is production-ready on Claude Code. Other AI coding tools (Cursor, Cline, Gemini CLI) have experimental/unsupported integration. The core CLI and measurement system work on any platform.
Getting Started
Empirica runs inside AI coding tools β it's a plugin, not a standalone app. You'll need Claude Code or a compatible terminal AI (OpenCode is untested but architecturally compatible).
Install Empirica
# Core framework + CLI (Python 3.10+)
pip install empirica
# MCP server for AI tool integration (Python 3.11+)
pip install empirica-mcp Set Up Claude Code
# Plugin, hooks, Sentinel, system prompt,
# MCP server, and statusline
empirica setup-claude-code Sets up session lifecycle, the noetic firewall, compaction recovery, statusline, and natural language commands.
Onboard & Work
# Guided first-run setup
empirica onboard
# Then just work naturally β
# the framework measures behind the scenes After onboarding, use /empirica status and empirica goals-list from within your AI session.
Want to Go Deeper?
Explore advanced features, architecture, and use cases.