SEN LLC

Posted on Apr 16

PHPStan's JSON Output Is Noise. I Wrote a Formatter with a Baseline Diff.

#php #phpstan #cli #tutorial

PHPStan's JSON Output Is Noise. I Wrote a Formatter with a Baseline Diff.

PHPStan itself is fantastic. Its human output scrolls off your CI log and drops structure, its JSON output is shaped like a storage format, and its baseline is a .neon file that GitHub's diff viewer doesn't know how to render. This post is about a 500-line PHP CLI that sits in front of all three.

📦 GitHub: https://github.com/sen-ltd/phpstan-report

The problem

If you've ever run PHPStan in CI on a real project, you've seen this shape of log:

 ------ ---------------------------------------------------------
  Line   src/Service/UserService.php
 ------ ---------------------------------------------------------
  42     Call to an undefined method App\Service\UserService::fetchCurrentUser().
  58     Parameter $user of method App\Service\UserService::save() has invalid type App\Model\LegacyUser.
  71     Undefined variable: $currentTenant
 ------ ---------------------------------------------------------

It scrolls. A project with 400 findings produces 1600 lines of those pipe-decorated tables before the summary. You cannot paste it into a pull request comment, you cannot grep a useful signal out of it, and anyone who opens the raw log ends up reading the first screen's worth and then closing the tab.

The response is usually to switch to --error-format=json. That's half of a fix. The JSON shape looks like this:

{
  "totals": { "errors": 7, "file_errors": 7 },
  "files": {
    "/abs/path/to/src/Service/UserService.php": {
      "errors": 3,
      "messages": [
        {
          "message": "Call to an undefined method ...",
          "line": 42,
          "ignorable": true,
          "identifier": "method.notFound"
        }
      ]
    }
  },
  "errors": []
}

It is a storage shape, not a reporting shape. The errors are nested under a map keyed by absolute path, which means your report header contains your CI runner's home directory. There is no severity field — PHPStan classifies findings as "error or not" and calls it a day. There is no per-file rollup because, well, each file already has its own block, so rolling up means walking the map yourself. And the errors[] array at the top level is not what you think it is; it holds generic errors (like "ignored error pattern was not matched in reported errors") and is almost always empty.

The other two existing options, also unhelpful for my goals:

--error-format=github already emits GitHub Actions workflow commands. Great for annotations, useless for a PR comment or a terminal summary.
phpstan-baseline.neon is PHPStan's own way to freeze an existing error set. It works, but it's a YAML-adjacent format that GitHub's review UI can't diff meaningfully, and every refactor that shifts line numbers produces fake baseline churn.

So I wrote phpstan-report. It eats --error-format=json and emits four different shapes for four different readers, plus a baseline diff built on top of a simpler matching algorithm than what .neon does.

Design

The tool is a reshaper, not an analyser. It never parses PHP, never loads PHPStan, and deliberately doesn't know anything about your phpstan.neon. That means it works with whatever PHPStan version your project is on, and upgrading PHPStan can never break it — the totals / files / messages shape has been stable for years.

Three components:

Parser — normalizes the phpstan JSON shape into a flat errors[] list. Each entry has file, line, message, identifier, ignorable, and — here's the addition — a level field: one of error, warning, info.
Baseline — loads a previous run (same JSON shape, it's just phpstan output from last time) and compares by (file, message). New errors, resolved errors, shared errors.
Four formatters — HumanFormatter (colorized terminal output), MarkdownFormatter (GitHub table for PR comments), JsonFormatter (slim projection for downstream tooling), GithubFormatter (workflow commands for inline PR annotations).

The classification step

PHPStan's JSON output does not include a severity per message. Everything is "an error." This is fine when PHPStan is the only thing looking at it — the whole point of a static analyser is that everything it emits is something you should fix. But when you're building a report for humans in a hurry, "2 real bugs + 200 unused imports" needs to read differently from "202 real bugs."

My classifier is deliberately stupid:

private const FATAL_PATTERNS = [
    '/Parameter .+ of method .+ has invalid type/i',
    '/Call to an undefined method /i',
    '/Call to an undefined (static )?function /i',
    '/Undefined variable: /i',
    '/Access to an undefined property /i',
    '/Instantiated class .+ not found/i',
    '/Class .+ not found/i',
    '/returns? .+ but should return /i',
];

private function classify(string $message, bool $ignorable): string
{
    foreach (self::FATAL_PATTERNS as $pat) {
        if (preg_match($pat, $message) === 1) {
            return self::LEVEL_ERROR;
        }
    }
    return $ignorable ? self::LEVEL_INFO : self::LEVEL_WARNING;
}

Eight regex patterns, a fallthrough based on PHPStan's own ignorable flag. That's it. No config file, no plugin system, no "pattern packs." If you don't like the classification, you edit src/Parser.php and the three lines of test fixtures that lock the behavior in. The whole rationale is that a config-driven classifier would either be stupid (you'd need to duplicate the rules for every project) or a full replacement for PHPStan's rule system (in which case, why are you not just using PHPStan's rule system?).

What this gives you is a summary line you can actually paste into a standup:

"4 errors, 0 warnings, 3 info — 7 findings across 3 files"

vs the PHPStan-native version:

"Found 7 errors"

Same data. Different conversation.

The normalization pass

Here's the entire parser loop, with some trimmings:

public function parseRaw(array $raw): array
{
    $errors = [];
    $files = $raw['files'] ?? [];
    if (is_array($files)) {
        foreach ($files as $path => $fileBlock) {
            $messages = $fileBlock['messages'] ?? [];
            foreach ($messages as $msg) {
                $text = (string)($msg['message'] ?? '');
                if ($text === '') continue;
                $line = (int)($msg['line'] ?? 0);
                $ignorable = (bool)($msg['ignorable'] ?? true);
                $errors[] = [
                    'file'       => (string)$path,
                    'line'       => max(0, $line),
                    'message'    => $text,
                    'identifier' => (string)($msg['identifier'] ?? ''),
                    'ignorable'  => $ignorable,
                    'level'      => $this->classify($text, $ignorable),
                ];
            }
        }
    }
    // ... generic top-level errors handled separately ...
    return ['errors' => $errors, 'totals' => [...]];
}

A few things worth mentioning:

line defaults to 0, not null. Downstream formatters can then treat 0 as "no line info," which happens for generic errors, class-level issues, and errors at the top of the file.
Generic top-level errors[] strings get a file of (generic) and a line of 0. This keeps the output shape uniform — every formatter can iterate a single flat list, no branching.
The ignorable flag is preserved even though we derive level from it. Downstream tools sometimes care about the raw PHPStan flag (e.g., "only gate on non-ignorable findings").

Baseline diff without the line-shift problem

PHPStan's baseline matches errors by file, message, and line number. That's the reason refactoring produces phantom churn. You rename a variable, five lines shift by 1, the .neon baseline no longer matches, PHPStan reports five "new" errors and five "resolved" errors on the same day.

phpstan-report matches by (file, message) only. Line numbers are kept for display, but not used as a baseline key:

private function keyOf(array $e): string
{
    return $e['file'] . "\x00" . $e['message'];
}

public function compare(array $current, array $baseline): array
{
    $baselineRemaining = $this->keyCounts($baseline);
    $baselineByKey = [];
    foreach ($baseline as $e) {
        $baselineByKey[$this->keyOf($e)][] = $e;
    }

    $new = $shared = $resolved = [];
    foreach ($current as $e) {
        $key = $this->keyOf($e);
        if (($baselineRemaining[$key] ?? 0) > 0) {
            $baselineRemaining[$key]--;
            $shared[] = $e;
        } else {
            $new[] = $e;
        }
    }
    foreach ($baselineByKey as $key => $bucket) {
        $leftover = $baselineRemaining[$key] ?? 0;
        $consumed = count($bucket) - $leftover;
        for ($i = $consumed; $i < count($bucket); $i++) {
            $resolved[] = $bucket[$i];
        }
    }
    return ['new' => $new, 'resolved' => $resolved, 'shared' => $shared];
}

The decrement-based walk is the thing I wanted to highlight. My first implementation tried to compute "new" and "resolved" independently by comparing key counts — and it got the bookkeeping wrong when the same (file, message) pair appeared multiple times (e.g., two Unused use statement messages in the same file). The credit-consumption walk is simpler to reason about: for every current finding, you either consume a baseline credit or you don't. Whatever credits remain at the end are the resolved errors. It's O(N) over both sides and has zero edge cases involving count-equal-but-not-matched.

There's a test that locks this behavior in:

public function test_compare_ignores_line_shifts(): void
{
    $parser = new Parser();
    $current = $parser->parseRaw(['files' => ['x.php' => [
        'messages' => [['message' => 'Unused use statement foo.', 'line' => 42, 'ignorable' => true]],
    ]]]);
    $base = $parser->parseRaw(['files' => ['x.php' => [
        'messages' => [['message' => 'Unused use statement foo.', 'line' => 5, 'ignorable' => true]],
    ]]]);
    $diff = (new Baseline($parser))->compare($current['errors'], $base['errors']);
    $this->assertSame([], $diff['new']);
    $this->assertSame([], $diff['resolved']);
    $this->assertCount(1, $diff['shared']);
}

Same message, different line, no delta. That's the whole point of the tool.

The `--fail-on-new` gate

The baseline diff is interesting on its own, but what you actually want in CI is a gate: "don't let this PR merge if it introduced new findings, but don't block on the findings that were already there." That's the --fail-on-new flag, which exits 1 when the current run has more errors than the baseline, and exits 0 otherwise:

$regression = count($errors) > count($baselineReport['errors']);
// ...
if ($parsed['failOnNew'] && $regression) {
    $exit = self::EXIT_FINDINGS;
}

Two things I'd call out:

The comparison is on count(), not on diff['new'], because in practice what you want is "total count must not increase." A PR that resolves 3 errors and introduces 4 would be caught by a strict new-only check but pass this one — and that's usually what you want, because otherwise people stop cleaning up old errors.
The base PHPStan exit code (1 when there are any findings at all) still fires, so if you don't pass --fail-on-new, the existing behavior is preserved. --fail-on-new is strictly a narrowing of the failure condition for CI flows where you've accepted the baseline.

The GitHub Actions formatter

This one is small enough to quote in full:

public function format(array $errors): string
{
    if ($errors === []) {
        return "::notice::phpstan-report: no errors\n";
    }
    $out = '';
    foreach ($errors as $e) {
        $command = match ($e['level']) {
            Parser::LEVEL_ERROR   => 'error',
            Parser::LEVEL_WARNING => 'warning',
            default               => 'notice',
        };
        $file = $this->escapeProperty($e['file']);
        $msg  = $this->escapeData($e['message']);
        $line = $e['line'] > 0 ? ",line={$e['line']}" : '';
        $out .= sprintf("::%s file=%s%s::%s\n", $command, $file, $line, $msg);
    }
    return $out;
}

The only non-obvious bit is the two different escape functions. GitHub's workflow command format distinguishes between property values (the file=... part, before the ::) and data values (the message, after the ::). Properties need %, \r, \n, :, and , escaped; data values only need %, \r, \n. It took me an embarrassing amount of time to find this in the docs the first time, so I've pinned it in tests:

public function test_escapes_file_property(): void
{
    $errors = [[ 'file' => 'some,weird:path.php', /* ... */ ]];
    $out = (new GithubFormatter())->format($errors);
    $this->assertStringContainsString('file=some%2Cweird%3Apath.php', $out);
}

Files with commas or colons in their paths are rare but they exist (especially on Windows, where drive letters mean every path has a colon), and leaving them unescaped makes the annotation silently fail. The PHPStan source uses the same escaping for its own github format; mine matches it.

Tradeoffs

A few things I deliberately did not build:

No SARIF output. SARIF is the "real" interchange format for static analysis results, and it's on the roadmap, but the initial cut targets the PR-comment / CI-log workflow, not IDE integration or enterprise security tooling. Adding SARIF is maybe 150 lines and I'll do it when someone asks.
No identifier-based ignore lists. PHPStan's JSON output includes a stable identifier field per message (e.g., method.notFound). phpstan-report passes that through in the json output but doesn't offer --ignore-identifier foo flags, because at that point you're rebuilding PHPStan's baseline system in a worse way.
Does not run PHPStan for you. You pipe its output in. This keeps the tool one-file-per-concern, and it means phpstan-report works unchanged as PHPStan itself evolves. The downside is one extra command in your CI config.
Line numbers are for display only. The baseline key is (file, message), which means if the same error genuinely occurs at a different location in a refactored file, we'll treat it as unchanged. In practice this is fine — PHPStan's error text usually includes the method or variable name, so "the same error in a different place" is almost always also a different message.
The classifier is a regex list. It's simple, it's fast, it handles the 80% case, and if it mis-classifies something in your codebase you can fix it in a pull request and we'll add a test. It is not clever, and it will never be clever.

One thing that's not a tradeoff but gets asked anyway: there's no config file. I looked at the PHPStan extension system and a few similar PHP tools and decided that the right answer was "no configuration at all in v1." Every flag has a sensible default, and the thing the tool does is small enough that you can read the whole source in twenty minutes if you want to know what it's doing.

Try it in 30 seconds

git clone https://github.com/sen-ltd/phpstan-report.git
cd phpstan-report
docker build -t phpstan-report .

# Run against a committed fixture
docker run --rm -v $(pwd)/tests/fixtures:/work phpstan-report /work/mixed.json

# Markdown table for PR comments
docker run --rm -v $(pwd)/tests/fixtures:/work phpstan-report /work/mixed.json --format markdown

# GitHub Actions annotations
docker run --rm -v $(pwd)/tests/fixtures:/work phpstan-report /work/mixed.json --format github

# Regression gate against a baseline
docker run --rm -v $(pwd)/tests/fixtures:/work phpstan-report \
    /work/mixed.json --baseline /work/baseline.json --fail-on-new
echo "exit=$?"   # 1, because mixed has 3 more errors than baseline

Or, against your own project:

./vendor/bin/phpstan analyse src --error-format=json | phpstan-report -

The runtime image is ~51 MB (multi-stage Alpine PHP 8.2), the test suite is 56 tests covering the parser / baseline / four formatters / every CLI exit branch, and the entire source tree is under 1,000 lines of PHP. It's MIT-licensed and I'd be happy to take PRs — especially for SARIF and for the classifier regex list.

This is entry 175 in SEN's "100+ small shippable open-source tools" program. If you're interested in the rest of the series, the full index lives at https://sen.ltd/portfolio/.

DEV Community

PHPStan's JSON Output Is Noise. I Wrote a Formatter with a Baseline Diff.

PHPStan's JSON Output Is Noise. I Wrote a Formatter with a Baseline Diff.

The problem

Design

The classification step

The normalization pass

Baseline diff without the line-shift problem

The `--fail-on-new` gate

The GitHub Actions formatter

Tradeoffs

Try it in 30 seconds

Top comments (0)

PHPStan's JSON Output Is Noise. I Wrote a Formatter with a Baseline Diff.

The problem

Design

The classification step

The normalization pass

Baseline diff without the line-shift problem

The --fail-on-new gate

The GitHub Actions formatter

Tradeoffs

Try it in 30 seconds

The `--fail-on-new` gate