close

DEV Community

Ingero Team
Ingero Team

Posted on • Originally published at ingero.io

MCP as Observability Interface: Connecting AI Agents to Kernel Tracepoints

TL;DR

MCP is becoming the interface between AI agents and infrastructure
data. Datadog shipped an MCP
Server connecting dashboards to AI agents.
Qualys flagged MCP servers as the new shadow IT risk.
We think both are right, and we think the architecture should
go further: the MCP server should not wrap an existing observability
platform. It should BE the observability layer. This post explores how
MCP can serve as a direct observability interface to kernel
tracepoints, bypassing traditional metric pipelines entirely.

MCP for Kernel and GPU Events

Three signals in one week

Three things happened in the same week of March 2026 that signal where
observability is headed.

Datadog shipped an MCP Server
Their implementation connects real-time observability data to AI agents for automated detection and remediation. An AI agent can now query Datadog dashboards, pull metrics, and trigger responses through the Model Context Protocol. This is a big company validating a small protocol.

Qualys published a security analysis of MCP
servers
.

Their TotalAI team called MCP servers “the new shadow IT for AI” and
found that over 53% of servers rely on static secrets for
authentication. They recommended adding observability to MCP servers:
logging capability discovery events, monitoring invocation patterns,
alerting on anomalies.

Cloud Native Now covered eBPF for Kubernetes network observability.
Microsoft Retina deploys as a DaemonSet, captures network telemetry via
eBPF without application changes, and provides kernel-level drop reasons. The article draws a clear line between “monitoring” (predefined questions) and “observability” (asking questions nobody planned for).

The thread connecting all three: AI agents need direct access to
infrastructure telemetry, and MCP is becoming the way they get it.

Two approaches to MCP observability

There are two ways to connect observability data to AI agents via MCP.

Approach 1: Wrap existing platforms. Datadog’s strategy. Take
existing metrics, logs, and traces, already collected and aggregated,
and expose them through MCP tools. The AI agent queries the dashboard
API, gets pre-processed data, and acts on it. This makes sense for teams
with a mature observability stack that want to add AI-powered automation
on top.

Approach 2: Build MCP-native observability. This is what we did with
the tracer. Instead of wrapping an existing platform, we built an eBPF
agent that traces CUDA Runtime and Driver APIs via uprobes, stores the
results in SQLite, and exposes everything through 7 MCP tools. The MCP
interface is not an adapter layer; it is the primary interface.

Neither approach is wrong. They solve different problems.

The wrapper approach works well for aggregate analysis: “What was the
p99 latency for service X over the last hour?” The data is already
summarized, indexed, and queryable.

The native approach works better for root-cause investigation: “Why did
this specific GPU request take 14.5x longer than expected?” That
requires raw kernel events, CUDA call stacks, and causal chains – not
summaries. The AI agent needs to drill down, not roll up.

What MCP-native observability looks like in practice

Here is a concrete example. We traced a vLLM TTFT regression where the
first token took 14.5x longer than baseline. The trace database captured
every CUDA API call, every kernel context switch, every memory
allocation.

When Claude connects to the MCP server and loads this database, it can:

  1. get_trace_stats – See the full trace summary: 12,847 CUDA events, 4 causal chains, total GPU time
  2. get_causal_chains – Read the causal chains that explain why latency spiked, in plain English
  3. run_sql – Run custom queries against the raw event data (“show me all cudaMemcpyAsync calls over 100ms”)
  4. get_stacks – Inspect call stacks for any flagged event

Claude identified the root cause in under 30 seconds: logprobs
computation was blocking the decode loop, creating a 256x slowdown on
the critical path. That root cause was not visible in any aggregate
metric. It only appeared in the raw causal chain between specific CUDA
API calls.

A dashboard MCP adapter could not have found this. The data granularity
does not survive aggregation.

The security angle matters too

Qualys raised valid concerns about MCP server security. Their finding
that 53% of servers rely on static secrets is alarming. Their
recommendation to log discovery and invocation events is exactly right.

For MCP servers that touch GPU infrastructure, the attack surface is
different. An MCP server with access to CUDA traces can expose timing
information, memory layouts, and model architecture details. The
security model needs to account for this.

In Ingero, the MCP server runs inside the same process as the eBPF tracing pipeline. There is no separate data layer between the AI agent and the kernel-level telemetry - the MCP tools query the same event store that the eBPF probes write to. This is why Ingero can answer causal questions in real time: the AI agent has direct access to raw kernel and CUDA events, not a pre-aggregated summary.

Try It Yourself

The project is open source. The investigation database from this post is available for download. Claude (or any MCP client) can connect to it and run an investigation:

git clone https://github.com/ingero-io/ingero.git
cd ingero && make build
./bin/ingero mcp --db investigations/pytorch-dataloader-starvation.db
Enter fullscreen mode Exit fullscreen mode

Investigate with AI (recommended)

You can point any MCP-compatible AI client at the trace database and ask questions directly. No code required.

First, create the MCP config file at /tmp/ingero-mcp-dataloader.json:

{
  "mcpServers": {
    "ingero": {
      "command": "./bin/ingero",
      "args": ["mcp", "--db", "investigations/pytorch-dataloader-starvation.db"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

With Ollama (local, free):

# Install ollmcp (MCP client for Ollama)
pip install ollmcp

# Investigate with a local model (no data leaves your machine)
ollmcp -m qwen3.5:27b -j /tmp/ingero-mcp-dataloader.json
Enter fullscreen mode Exit fullscreen mode

With Claude Code:

claude --mcp-config /tmp/ingero-mcp-dataloader.json
Enter fullscreen mode Exit fullscreen mode

Then type /investigate and let the model explore. Follow up with questions like “what was the root cause?” or “which processes were competing for CPU time?”

The MCP server exposes 7 tools. Claude will figure out the rest.


Ingero is free & open source software licensed under Apache 2.0 (user-space) + GPL-2.0/BSD-3 (eBPF kernel-space). One binary, zero dependencies, <2% overhead. Give us a star at GitHub!

Related reading

Top comments (0)