Epistemic Measurement & AI Metacognition

We Gave AI a Mirror.
Now It Measures What It Believes.

Git-native coordination layer for AI agents. Track epistemic state, catch capability drift, and enable seamless multi-agent workflows—all stored in your repository.

Install CLI Read the Docs

🌳

Making Git Sexy

Git becomes efficient external memory for AI agents. Through epistemic measurement, your AI automatically maps every decision, uncertainty, and capability shift to git notes—turning your repository into a distributed cognitive state machine. No manual tracking, no context loss, just seamless version-controlled reasoning.

What Is Empirica?

Core capabilities that make AI agents smarter, more reliable, and easier to coordinate.

📐

Epistemic Transactions

Every task follows a measured cycle. PREFLIGHT captures what the AI believes before starting. During investigation, noetic artifacts are logged — findings, unknowns, dead-ends, assumptions — building a map of what's known and what isn't. The Sentinel CHECK gate validates readiness before transitioning to action, where praxic artifacts track goals, decisions, and commits. POSTFLIGHT measures what was actually learned.

How: Each transaction produces a verifiable delta between belief and outcome. Across transactions, the trajectory reveals calibration drift. Extensible to visual dashboards for team-level oversight.

🔄

Context Continuity

Resume work without repetition. Each session builds on the last — preserving reasoning state, not just file diffs. Epistemic handoff reports carry what matters forward; compaction hooks recover context automatically.

How: Compact epistemic snapshots + git checkpoint recovery. Dramatic token reduction vs. full context replay.

🌳

Git-Native Coordination

Your repository is the source of truth. Epistemic checkpoints stored in git notes—distributed, version-controlled, queryable. Multi-agent coordination without central servers.

How: 4-layer memory (Hot + Warm + Search + Cold). Checkpoint discovery, branch-based workflows, crypto-ready.

⚡

Dynamic Context Loader

Load exactly what you need, when you need it. Project bootstrap scales context depth based on uncertainty — from minimal to deep. Cross-session, cross-client, cross-provider knowledge transfer.

How: Uncertainty-driven bootstrap with findings, unknowns, dead-ends, and semantic search. Significant token reduction vs. manual reconstruction.

💂

Sentinel Gate

Investigate first, act second — enforced by the system, not willpower. The Sentinel blocks destructive actions (file edits, code writes) until the AI has demonstrated sufficient understanding through a CHECK gate. No more "ready, fire, aim."

How: Noetic firewall validates epistemic readiness before allowing praxic tools. Readiness is assessed holistically — gaming the gate degrades calibration.

🧬

Epistemic Awareness through Humility

AI that knows what it knows — and what it doesn't. Through measured self-assessment calibrated against real outcomes, agents develop calibrated epistemic awareness grounded in humility: honest uncertainty is more valuable than false confidence.

How: 13 epistemic vectors measuring knowledge, capability, uncertainty, and reasoning state. Dual-track calibration compares self-assessment against objective evidence — tests, git metrics, goal completion.

Validated across 1.19M+ Bayesian observations, 790+ sessions, and 210+ epistemic transactions.

🎯

Grounded Calibration

Self-assessment alone isn't enough. After every transaction, the AI's beliefs about its own performance are compared against objective evidence: did the tests pass? How many files changed? Were the goals actually completed? The gap between belief and reality is the calibration signal.

How: Dual-track system — Track 1 measures learning trajectory (PREFLIGHT vs POSTFLIGHT), Track 2 measures calibration accuracy (self-assessment vs objective evidence). When they diverge, Track 2 wins.

Evidence profiles (code quality, prose quality, goal completion) are extensible — add a domain-specific verifier and the system grounds against it.

🧠

Epistemic Memory

Not just storage — a cognitive immune system. Findings act as antigens; lessons act as antibodies. New discoveries challenge existing beliefs, reducing confidence in outdated knowledge. Memory decays naturally, keeping what matters fresh.

How: 4-layer architecture (Hot session state, Warm SQLite, Search via Qdrant semantic retrieval, Cold git archives). Eidetic facts with confidence scores, episodic narratives with natural decay.

💬

Natural Language Workflow

No new syntax to learn. Empirica integrates through hooks, skills, and MCP tools into your existing AI workflow. Describe what you're doing in natural language — the framework handles measurement, gating, and artifact logging behind the scenes.

How: Plugin system with session hooks, Sentinel gate enforcement, compaction recovery, and slash commands. Production-ready on Claude Code — other platforms are experimental.

Why Empirica?

The Problem

❌ AI agents forget everything between sessions and across compaction
❌ AI acts before it understands — edits code it hasn't read, commits changes it can't verify
❌ No way to measure whether AI reasoning is actually improving or drifting
❌ AI decisions leave no audit trail — no way to trace what was known, assumed, or wrong
❌ Overconfidence goes unchecked — AI says "done" when tests fail and goals aren't met

With Empirica

✅ Epistemic state persisted in git, SQLite, and Qdrant — survives sessions and compaction
✅ Sentinel gate enforces investigation before action — no more "ready, fire, aim"
✅ Grounded calibration compares AI self-assessment against real outcomes — tests, git, goals
✅ Every finding, assumption, dead-end, and decision logged — full epistemic provenance
✅ Dual-track measurement catches the gap between belief and reality before it compounds

How It Works

Simple workflow, powerful results. The CASCADE cycle tracks epistemic state throughout any task.

📋

PREFLIGHT

Before starting work, AI assesses what it knows, what it can do, and how uncertain it is. Creates baseline for comparison.

🔬

Noetic → CHECK → Praxic

Investigate first, then act. The noetic phase builds understanding (findings, unknowns, dead-ends). The Sentinel CHECK gate validates readiness before transitioning to praxic action (edits, commits, implementations).

→ Noetic Artifacts: Findings, unknowns, dead-ends, assumptions — logged as you investigate

→ Praxic Artifacts: Goals, subtasks, commits — tracked as you implement

📊

POSTFLIGHT

After completing work, AI re-assesses epistemic state. Measures actual learning via PREFLIGHT→POSTFLIGHT deltas. Triggers grounded verification against objective evidence (tests, git, goals).

Who Is Empirica For?

Empirica is AI-native—built for AI agents that need epistemic calibration. Any domain requiring accurate context, uncertainty tracking, and reliable reasoning benefits from agents that know what they know.

🛠️

AVAILABLE NOW

Software Engineering

→ Code agents preventing capability drift
→ Multi-agent teams with context continuity
→ Production AI with real-time self-monitoring

🏥

FUTURE DOMAIN

Healthcare & Legal

→ Medical diagnosis with evidence quality tracking
→ Legal research assessing source reliability
→ Misinformation detection in records

📊

FUTURE DOMAIN

Finance & Analysis

→ Trading agents with uncertainty-aware decisions
→ Risk assessment with confidence bounds
→ Financial report analysis with source validation

🗣️

FUTURE DOMAIN

Communication & Knowledge

→ Meeting assistants detecting misunderstandings
→ Research agents tracking contradictory sources
→ News verification with epistemic scoring

Multi-Project, Multi-AI Workflows

Run multiple AI agents across multiple projects simultaneously — each with isolated epistemic state.

🔒

Instance Isolation

Each tmux pane gets its own isolated instance, identified by TMUX_PANE. Run Claude Code in three panes working on three different projects — or the same project with different AI models — each with its own sessions, transactions, and epistemic state. No cross-contamination, seamless switching.

tmux pane %1 → empirica-core (claude-code)
tmux pane %2 → empirica-web (claude-code)
tmux pane %3 → client-project (qwen-testing)

Works with tmux, screen, or any terminal multiplexer. Falls back to TTY identification without tmux.

🔄

Automatic Context Switching

Projects auto-detected from git remote URLs. Each project gets its own SQLite database, findings, goals, and calibration history. Switch projects within a pane and your full epistemic state travels with you — or run multiple projects simultaneously across panes.

empirica project-switch my-project
→ Session reattached, 12 findings loaded

🤖

Multi-AI Identity

Different AI models tracked under different identities. Each AI has its own calibration history, bias corrections, and learning trajectory. Handoffs between agents carry full epistemic context.

--ai-id claude-code # primary dev
--ai-id qwen-testing # test specialist
--ai-id gemini-review # code review

🔧

Session Recovery

Crashes, restarts, and compaction don't lose state. Transactions can be adopted by new instances, sessions resumed across terminal restarts, and epistemic state recovered from git checkpoints.

empirica transaction-adopt --from tmux_5
→ Transaction resumed, context intact

Platform note: Full plugin integration (hooks, Sentinel, skills) is production-ready on Claude Code. Other AI coding tools (Cursor, Cline, Gemini CLI) have experimental/unsupported integration. The core CLI and measurement system work on any platform.

Getting Started

Empirica runs inside AI coding tools — it's a plugin, not a standalone app. You'll need Claude Code or a compatible terminal AI (OpenCode is untested but architecturally compatible).

Install Empirica

# Core framework + CLI (Python 3.10+)
pip install empirica

# MCP server for AI tool integration (Python 3.11+)
pip install empirica-mcp

Set Up Claude Code

# Plugin, hooks, Sentinel, system prompt,
# MCP server, and statusline
empirica setup-claude-code

Sets up session lifecycle, the noetic firewall, compaction recovery, statusline, and natural language commands.

Onboard & Work

# Guided first-run setup
empirica onboard

# Then just work naturally —
# the framework measures behind the scenes

After onboarding, use /empirica status and empirica goals-list from within your AI session.

📖

Want to Go Deeper?

Explore advanced features, architecture, and use cases.

🏗️

We Gave AI a Mirror. Now It Measures What It Believes.

Making Git Sexy

What Is Empirica?

Epistemic Transactions

Context Continuity

Git-Native Coordination

Dynamic Context Loader

Sentinel Gate

Epistemic Awareness through Humility

Grounded Calibration

Epistemic Memory

Natural Language Workflow

Why Empirica?

The Problem

With Empirica

How It Works

PREFLIGHT

Noetic → CHECK → Praxic

POSTFLIGHT

Who Is Empirica For?

Software Engineering

Healthcare & Legal

Finance & Analysis

Communication & Knowledge

Multi-Project, Multi-AI Workflows

Instance Isolation

Automatic Context Switching

Multi-AI Identity

Session Recovery

Getting Started

Install Empirica

Set Up Claude Code

Onboard & Work

Full Setup Guide

MCP Integration

CLI Reference

Want to Go Deeper?

Architecture

Monitoring

Multi-Agent

Research

We Gave AI a Mirror.
Now It Measures What It Believes.