DEV Community

When

GrimLabs — Thu, 16 Apr 2026 14:00:03 +0000

Our procurement team was reconciling a batch of vendor invoices against purchase orders. They used a matching tool that returned two categories: Match and No Match. Simple. Binary. Clean.

The problem was that about 800 records came back as No Match. The team started reviewing them manually, and within the first hour they realized something frustrating. About 300 of those "no matches" were obviously the same transaction with minor variations. "Amazon Web Services" vs "AWS." Invoice amount $4,999.50 vs PO amount $5,000. Date off by one day.

These werent really mismatches. They were near-matches that fell just below whatever threshold the tool was using. And the tool gave zero indication of how close they were. A record that missed by 0.1% looked exactly the same as a record that missed by 90%. Both just said "No Match."

So the team had to review all 800 equally. No prioritization. No way to triage. Just a flat list of failures. It took three days when it should have taken one.

The binary matching problem

Most data matching tools, including Excel's VLOOKUP and many dedicated platforms, give you a binary answer. Either the records match or they dont. Theres no in-between.

This makes sense when you're matching on exact identifiers. If two records share the same Social Security number, they match. Period. No confidence needed.

But most real-world matching isnt like that. You're matching on names that have variations, amounts that differ due to rounding or tax, dates that shift depending on which event they represent. In these cases, the line between "match" and "no match" is fuzzy. And a binary tool forces you to pick a threshold that will inevitably be wrong for some records.

Set the threshold too strict and you get false negatives (real matches classified as no-match). Set it too loose and you get false positives (different records classified as matches). There is no threshold that works perfectly for all records.

This is where confidence scores change everything.

What confidence scores actually are

A confidence score is a number (usually 0-100% or 0.0-1.0) that represents how likely it is that two records are the same entity. Instead of "match" or "no match," you get a spectrum.

95-100%: Almost certainly the same. Auto-approve these.
80-94%: Probably the same but worth a quick human check.
60-79%: Might be the same. Needs careful human review.
Below 60%: Probably not the same. Low priority or skip.

The score is calculated from multiple factors. Name similarity might contribute 40% of the score. Amount closeness might contribute 30%. Date proximity might contribute 20%. Other fields might contribute 10%.

A record where the names are 95% similar and amounts match exactly might get a 96% confidence score. A record where names are 70% similar and amounts differ by 15% might get a 55% confidence score. Both are "near matches" but they require very different handling.

Why this changes the workflow

With binary matching, your review workflow looks like this:

Run matching
Get a pile of "no match" records
Review all of them, in whatever order they happen to appear
Spend equal time on each one regardless of how close or far the match was

With confidence scores, your workflow becomes:

Run matching
Auto-approve everything above 95% confidence
Quick-review the 80-94% tier (most of these are valid matches with minor variations)
Careful review of the 60-79% tier (these need actual judgment)
Batch-reject everything below 60%

In practice, the distribution usually looks something like this:

60-70% of records match at 95%+ confidence (auto-approve)
15-20% match at 80-94% (quick review)
10-15% match at 60-79% (careful review)
5-10% fall below 60% (likely non-matches)

This means that instead of manually reviewing 100% of your uncertain records, you're really only doing careful review on 10-15% of them. The rest are either auto-approved or quickly triaged by the confidence score.

According to research from the MIT Sloan Management Review, organizations that implement confidence-based decision workflows see 40-60% reductions in manual review time compared to binary decision systems.

Real example: vendor reconciliation

Let me walk through how this works in a real reconciliation scenario.

You have 3,000 invoices to match against purchase orders this month. A confidence-scoring tool processes them and returns:

2,100 matches at 95%+ confidence. These are clean. Names match closely, amounts are within $1, dates align. Auto-approved in bulk.
450 matches at 80-94% confidence. Quick scan shows most are legitimate matches with abbreviation differences ("Corp" vs "Corporation") or small amount variations (tax rounding). Takes about 2 hours to review.
300 matches at 60-79% confidence. These need actual investigation. Maybe the vendor name is significantly different but the amount and date match. Or the name matches but the amount is off by 10%. Each one takes 2-3 minutes. About 10-12 hours of work.
150 non-matches below 60%. Bulk reject or set aside for exception processing.

Total review time: about 14 hours. Without confidence scores and with binary matching, you'd be reviewing all 900 uncertain records (450 + 300 + 150) at equal depth. Probably 30+ hours.

Thats a 50%+ reduction in review time. Every month. Just from better information about match quality.

The human-in-the-loop principle

Confidence scores implement what AI researchers call "human-in-the-loop" design. The system handles the decisions it can make confidently and routes the uncertain ones to humans.

This is better than full automation (which makes mistakes on edge cases) and better than full manual review (which wastes human time on obvious cases). Its the best of both worlds.

The key insight is that not all uncertain records are equally uncertain. A 91% confidence match and a 62% confidence match are both "uncertain" in a binary system, but they require very different levels of human attention. Confidence scores let you allocate human time proportionally to actual uncertainty.

A Harvard Business Review article on human-AI collaboration found that the most effective AI-human workflows are ones where AI handles routine decisions and escalates ambiguous ones to humans with context. Confidence scores are that context.

Beyond simple matching

Confidence scores arent just useful for data matching. They apply to any classification or decision problem where you want to combine automation with human judgment.

Fraud detection. A 98% confidence fraud score means block the transaction automatically. A 70% score means flag for human review. A 30% score means let it through.

Lead scoring. A lead with 90% conversion confidence gets immediate sales follow-up. A lead at 60% gets nurture marketing. Below 40% gets deprioritized.

Document classification. An invoice classified as "utilities" with 95% confidence gets auto-routed. One classified with 65% confidence gets human verification.

The principle is the same everywhere: use the confidence score to determine the appropriate level of human involvement. High confidence means low human involvement. Low confidence means high human involvement.

What to look for in matching tools

If you're evaluating data matching or reconciliation tools, heres what to look for regarding confidence scoring:

Transparent scoring. Can you see why a match got the score it did? Which fields contributed and how much? Black-box scores are better than no scores, but transparent scores let you tune your thresholds.

Adjustable thresholds. Can you change what counts as "auto-approve" vs "review" vs "reject"? Different use cases need different thresholds. Financial reconciliation might need 98% confidence for auto-approval. Marketing list dedup might be fine at 85%.

Field weighting. Can you tell the system that name similarity matters more than date proximity for your specific use case? Weighting lets you encode domain knowledge into the scoring.

Exportable results with scores. Can you get the confidence scores in your export file, not just the match/no-match decision? This lets you do additional analysis or apply different thresholds later.

DataReconIQ provides confidence scores with field-level breakdowns, so you can see exactly why each match was scored the way it was and adjust your review process accordingly.

The bottom line

Binary matching made sense when computing power was expensive and the only realistic option was a simple threshold. But we're past that now. The algorithms for confidence scoring exist, theyre not computationally expensive, and they dramatically improve the efficiency of any matching or reconciliation workflow.

If your current tool gives you "match" or "no match" with nothing in between, you're spending unnecessary hours reviewing records that a confidence score would have triaged for you. Tbh, once you work with confidence scores, going back to binary matching feels like going back to a world where traffic lights only had red and green with no yellow.

The yellow light is the whole point. It tells you to slow down and pay attention, but only when its actually needed. Everything else, you can handle on autopilot.

We Scanned 200 SMB Domains. Here's What We Found.

ComplianceLayer — Thu, 16 Apr 2026 14:00:02 +0000

We Scanned 200 SMB Domains. Here's What We Found.

Published by the ComplianceLayer team | March 2026

Last quarter, we ran ComplianceLayer against 200 small and medium business domains — companies with 10 to 500 employees across industries including professional services, healthcare-adjacent (no PHI), retail, and technology. No one paid us to do this. We wanted to know: how is the average SMB actually doing on the fundamentals of external security?

The results were worse than we expected. And we expected bad.

Here's what we found.

Methodology

We used our own tool — ComplianceLayer — to run a full external security scan on each domain. Each scan checks four categories: SSL/TLS configuration, DNS/email authentication (SPF, DMARC, DKIM), HTTP security headers, and open port exposure. Domains were sourced from a mix of public business directories and submitted by MSP partners who gave permission to aggregate anonymized findings. No internal systems were tested. All scans were passive external assessments.

Domains were graded A through F per category.

SSL/TLS: Better Than Expected, But Fragile

The SSL picture was the most encouraging of the four categories — but the details tell a more complicated story.

71% of domains earned an A or B grade on SSL. The widespread adoption of Let's Encrypt and auto-renewing certificate providers has pushed basic SSL hygiene into the mainstream. Most domains had valid certificates.

But dig one layer deeper:

23% were running TLS 1.0 or TLS 1.1 alongside modern TLS 1.3. Both older protocol versions have known vulnerabilities and were officially deprecated by the IETF in 2021. Supporting them for "compatibility" is a real risk.
11% had certificates expiring within 30 days. These aren't companies that forgot to renew — they're companies where nobody is watching. For an MSP, that's a 2 AM emergency call waiting to happen.
6% had expired certificates entirely. Fully expired. In production.
4% were using SHA-1 signed certificates — an algorithm considered broken for over a decade.

The headline SSL number looks fine. The tail is ugly.

DNS & Email Security: The Worst Category by Far

If there's one finding we'd highlight in a conference talk, it's this: SMB email authentication is a disaster.

Email spoofing — where an attacker sends email pretending to be from your domain — is one of the most effective phishing vectors in existence. Three DNS records prevent it: SPF, DMARC, and DKIM. All three are free to configure. All three have been industry best practice for years.

Here's where 200 SMB domains stood:

SPF present: 64% ✓ — Better than average, but still 36% missing.
DMARC present: 31% ✓ — Over two-thirds of SMBs have no DMARC record.
DKIM present: 44% ✓ — Less than half have DKIM signing configured.
All three configured correctly: 18% ✓ — Only 1 in 5 SMBs has complete email authentication.

To be clear about what the missing 69% of DMARC means: anyone on the internet can send email that appears to come from their domain, and receiving mail servers have no policy-based mechanism to reject or quarantine it. That's the setup for CEO fraud, vendor impersonation, and credential phishing.

The fix is a DNS record. It takes 10 minutes. But without active monitoring, most SMBs will never notice it's missing.

HTTP Security Headers: Low-Hanging Fruit, Widely Missed

HTTP security headers are configurations added to web server responses that instruct browsers to enforce security policies. Most don't require application changes — just a web server configuration tweak. Yet the adoption rate among SMBs is remarkably low.

Results across our 200-domain sample:

Header	Present	Missing
HSTS (HTTP Strict Transport Security)	47%	53%
X-Frame-Options	38%	62%
X-Content-Type-Options	41%	59%
Content-Security-Policy (CSP)	19%	81%
Referrer-Policy	29%	71%
Permissions-Policy	11%	89%

Only 8% of domains had all six headers configured correctly.

Content-Security-Policy is the most complex to implement — it requires understanding what scripts your site loads — and its 19% adoption reflects that complexity. But HSTS, X-Frame-Options, and X-Content-Type-Options are one-line nginx or Apache config changes. There's no good reason for 53–62% of SMBs to be missing them.

The absence of X-Frame-Options leaves sites vulnerable to clickjacking. Missing X-Content-Type-Options can enable MIME-type sniffing attacks. These aren't theoretical — they show up in penetration test reports as exploitable issues.

Open Ports: A Few Alarming Findings

Open port analysis checks which network services are reachable from the public internet. Some open ports are expected (80/HTTP, 443/HTTPS). Others are not.

Unexpected open ports found across the dataset:

RDP (port 3389) exposed to internet: 14% of domains
SMB (port 445) exposed to internet: 7% of domains
Telnet (port 23) open: 3% of domains
FTP (port 21) open: 9% of domains
SSH on default port 22: 31% (elevated risk if using password auth)

RDP exposed to the internet is a well-documented ransomware entry point. The 14% figure is consistent with external research — RDP brute force has been the leading initial access vector in ransomware incidents for several consecutive years according to multiple incident response firm reports.

SMB exposed to the internet raises WannaCry-era memories. It should not be reachable from the public internet in any SMB deployment.

The good news: 62% of domains earned an A or B on port exposure, meaning most SMBs have at least the basics of network perimeter hygiene. The remaining 38% have at least one significant finding.

The Overall Picture

Scoring each domain across all four categories:

A overall: 4%
B overall: 23%
C overall: 38%
D overall: 27%
F overall: 8%

More than one-third of SMBs scored D or F on overall external security posture. The most common failure pattern was: decent SSL, missing email authentication, no security headers, one or two problematic open ports.

This isn't a technology problem. It's a visibility problem. MSPs managing these companies often don't have an automated way to track this across their client base. The clients themselves have no idea. Nobody is watching the dashboard that doesn't exist.

What We Recommend

Based on these findings, here's the priority order for any SMB or MSP addressing the gap:

Fix DMARC immediately. It's free, it takes 10 minutes, and the blast radius of not having it is enormous. Start with p=none if you need to monitor before enforcing.
Audit open ports. RDP should never be internet-facing. Use a VPN or jump host.
Add HSTS and X-Content-Type-Options. Two header lines in your web server config.
Check SSL expiry. Set up monitoring or use a cert provider with auto-renewal.
Add CSP. More complex, but important for any site loading third-party scripts.

Try It Yourself

If you're an MSP or sysadmin who wants to know where your clients or your own domains stand, ComplianceLayer's free tier lets you run 10 domain scans per month with no credit card required. You'll get an A-F grade per category and a specific remediation checklist for every failing check.

Start scanning free →

We didn't write this to sell subscriptions (though we're happy if you upgrade). We wrote it because someone needs to show the actual numbers — and the numbers say most SMBs are one missed DMARC record away from a convincing phishing campaign.

Data collected Q1 2026. N=200 SMB domains. External passive scanning only. No internal systems accessed.

Built by ComplianceLayer — scan any domain for security compliance in seconds. Get your free API key.

Benchmarking Memoria on LongMemEval: Strong Memory Retrieval, Clear Reader Separation

MatrixOrigin — Thu, 16 Apr 2026 14:00:00 +0000

Memory systems for AI agents are easy to demo and much harder to evaluate.

A good anecdote can make any memory layer look impressive. A real benchmark has to answer a tougher question: when an agent is asked to recover user facts, track updates, reason over time, and connect information across sessions, does the memory system consistently surface the right context?

That is what we tested with Memoria on LongMemEval_s.

Using a single frozen retrieval snapshot from Memoria, we evaluated three different reader models on exactly the same retrieved memories, then scored all answers with a unified judge using the official LongMemEval task rules. The result is a cleaner measurement than a typical end-to-end benchmark: retrieval stays fixed, so differences in final accuracy mostly reflect how well a reader can use the memory Memoria provides.

The headline result: Memoria retrieval supported up to 88.78% overall accuracy on LongMemEval_s, with near-perfect performance on single-session recall and strong results on knowledge updates, temporal reasoning, and multi-session synthesis.

Why this benchmark matters

LongMemEval is a useful stress test because it goes beyond simple fact lookup. It evaluates whether a system can handle six distinct memory behaviors:

Category	What it tests	Count
SSU	Single-session user facts	70
SSA	Single-session assistant facts	56
SSP	Single-session personalization	30
KU	Knowledge updates and conflict resolution	77
TR	Temporal reasoning	133
MS	Multi-session synthesis	133

There is also an Abstention subset of 30 questions, where the correct behavior is to recognize that the answer is not available from memory.

That makes LongMemEval a strong fit for evaluating an agent memory layer. A production memory system is not just supposed to store text. It needs to retrieve the right facts, preserve recency, support reasoning over timelines, and avoid pushing the model into confident hallucinations when the answer is unavailable.

Experimental setup

This run used:

Dataset: LongMemEval_s
Data file: benchmarks/longmemeval/data/longmemeval_s_cleaned.json
Retrieval backend: Memoria
Retrieval snapshot: benchmarks/longmemeval/results/retrieval_results.json

The snapshot covered 500 retrieval records, with 10 memories returned per question. One retrieval timed out (db467c8c), leaving 499 judged examples.

The evaluation pipeline was straightforward:

Historical sessions were ingested into Memoria.
Memoria retrieved relevant memories for each question.
A unified reader prompt was generated from the same retrieval snapshot.
Three reader models answered using identical retrieved context.
All hypotheses were scored by a single GPT-5.4 judge using the official LongMemEval task-specific rubric.

The three readers were:

gpt-5.4
claude-opus-4.6
claude-sonnet-4.5

The important design choice here is that retrieval was frozen across all readers. This isolates the effect of downstream reasoning from the quality of the memory backend itself.

Overall results

Reader run	Reader	Judge	Correct	Overall	IDK Count
gpt-5.4	gpt-5.4	gpt-5.4	424/499	84.97%	3
opus-4.6	claude-opus-4.6	gpt-5.4	443/499	88.78%	0
sonnet-4.5	claude-sonnet-4.5	gpt-5.4	353/499	70.74%	79

The top-line takeaway is simple: Memoria retrieved enough useful context for a strong reader to answer nearly 89% of LongMemEval_s correctly.

That matters because this was not a jointly optimized stack. The retrieval snapshot was fixed. The judge was fixed. The only thing that changed was the reader. In other words, Memoria was already surfacing a context set strong enough to support high accuracy without changing the underlying memory results.

Category breakdown

Reader	SSU	SSA	SSP	KU	TR	MS	Abstention
gpt-5.4	98.57	100.00	86.67	77.92	88.72	71.43	56.67
opus-4.6	100.00	100.00	76.67	89.61	90.23	78.95	93.33
sonnet-4.5	95.71	100.00	43.33	58.44	64.66	64.66	86.67

These numbers tell a more interesting story than the overall score alone.

1. Memoria is extremely strong on direct factual recall

Single-session recall is close to saturated.

All three readers reached 100% on SSA, and SSU ranged from 95.71% to 100%. That suggests Memoria is consistently retrieving the right evidence for straightforward factual questions, whether the fact originated from the user or from the assistant.

This is exactly the baseline behavior a memory layer has to get right. If retrieval is weak, these categories usually collapse first. Here, they are effectively solved.

2. The real separator is not retrieval, but reasoning over retrieved memory

The largest gaps appear in:

Knowledge Update
Temporal Reasoning
Multi-Session
Personalization

Those are the categories that require more than locating a fact. The reader has to decide which memory is most recent, reconcile conflicting evidence, infer ordering, or synthesize information across sessions.

That distinction is important. A weaker result in these categories does not necessarily mean the memory backend failed. In many cases, it means the model failed to use the retrieved evidence correctly.

The strongest example is Knowledge Update. With the same Memoria retrieval snapshot, performance ranged from 58.44% to 89.61% depending on the reader. That is a large spread, and it strongly suggests the retrieved context often contained the necessary evidence, but not every model was equally good at choosing the latest valid fact.

3. Memoria supports strong temporal and cross-session reasoning

Two of the hardest LongMemEval categories are Temporal Reasoning (TR) and Multi-Session (MS).

On the frozen Memoria retrieval snapshot, the best reader reached:

90.23% on TR
78.95% on MS

That is a strong result for a memory benchmark. These are not simple quote-retrieval tasks. They require a model to read multiple memory items, track dates or ordering, and compose an answer that reflects the correct timeline or session-level synthesis.

In practice, this is much closer to how agent memory is actually used. Real agents do not just need to remember that a user likes tea. They need to remember what changed, when it changed, and how different conversations relate.

4. Abstention reveals calibration, not just recall

The abstention subset is especially useful because it tests whether a model can recognize when memory is insufficient.

Here the best result came from claude-opus-4.6, which achieved 93.33% on abstention. claude-sonnet-4.5 was also relatively strong at 86.67%, while gpt-5.4 lagged at 56.67%.

The IDK Count helps explain why. GPT-5.4 only emitted the exact string I don't know 3 times, while Sonnet did so 79 times. GPT-5.4 was much more aggressive; Sonnet was much more conservative.

That does not change the core result for Memoria, but it does show something valuable: the same retrieval layer can support very different downstream answer behaviors depending on the reader. A memory stack should be evaluated not only on what it retrieves, but also on how different readers convert that retrieval into answers or refusals.

What this says about Memoria

Taken together, these results show three things.

First, Memoria can retrieve the right evidence at high frequency. Near-perfect SSU and SSA performance across readers is difficult to achieve if the memory layer is not consistently surfacing the correct context.

Second, Memoria preserves enough structure for complex downstream use. Strong scores in knowledge updates, temporal reasoning, and multi-session tasks indicate that the retrieved memories are not just loosely relevant. They are sufficiently precise and ordered to support harder reasoning.

Third, Memoria has a high performance ceiling. When paired with a stronger reader, the same retrieval snapshot reaches 88.78% overall. That is a strong signal that the backend is doing real work and that additional gains can come from improving reader behavior rather than rethinking the memory layer from scratch.

In other words, Memoria is not just storing conversation history. It is producing retrieval outputs that are good enough for advanced models to answer difficult memory questions accurately.

Why the unified-judge setup matters

One reason benchmark results are often hard to interpret is that too many variables change at once. Retrieval changes. Prompting changes. Judging changes. The final number reflects the whole stack, but it is hard to know what actually improved.

This evaluation reduces that ambiguity:

one dataset
one retrieval snapshot
one judge
one official rubric
three readers

That makes the result more legible. The benchmark is not asking whether one model is “best.” It is asking what Memoria retrieval enables under controlled conditions.

And under those conditions, the answer is clear: Memoria provides a strong enough memory substrate for high-accuracy long-horizon QA, especially when paired with a capable reader.

Limitations

A fair reading should note what this benchmark does not prove.

This is one dataset, one retrieval configuration, and one judge. The benchmark also fixes retrieval at 10 memories per question, which may not be optimal for every reader. And because the final metric is answer accuracy rather than retrieval recall in isolation, some portion of the variance still belongs to the model, not the memory layer.

But those caveats do not weaken the main finding. They sharpen it.

The result here is not that Memoria solves memory in the abstract. It is that on a realistic long-memory benchmark, Memoria retrieval is strong enough to support state-of-the-art answer quality under a controlled unified evaluation.

Conclusion

Agent memory should be judged on more than demos.

In this LongMemEval evaluation, Memoria retrieved a fixed context snapshot that enabled:

88.78% overall accuracy at best
100% SSA
100% SSU with the strongest reader
89.61% Knowledge Update
90.23% Temporal Reasoning
78.95% Multi-Session
93.33% Abstention on unanswerable questions

That is a compelling profile for a production memory layer.

The most important conclusion is not just that one reader scored higher than another. It is that Memoria consistently surfaced the evidence needed for strong readers to perform well across direct recall, updates, timelines, and cross-session synthesis.

For teams building AI agents, that is what memory infrastructure is supposed to do.

And on this benchmark, Memoria does it.

— — —

Experience the power of persistent memory for AI Agents. 🧠
💻 GitHub (Star us!):[https://github.com/matrixorigin/Memoria]
🌐 Website: [https://thememoria.ai/]
👾 Discord: [https://discord.com/]

Stop Googling 'Can I Use That AI Feature on the Free Plan'

snapsynapse — Thu, 16 Apr 2026 14:00:00 +0000

Someone raises their hand: "Can I actually use that on the free plan?"

You're 90% sure the answer is yes. But the vendor changed their pricing page last month. And the feature you demoed in January got moved to a higher tier a couple weeks ago. And you're not going to pull up six different pricing pages in front of a live audience to find out.

This is the problem. Not that AI tools are hard to understand. That the ground keeps moving, and nobody maintains a single place where the current answers live.

What this is

AI Tool Watch is an open-source, static-site reference that tracks capabilities, pricing tiers, platform support, and availability across the major consumer AI products: ChatGPT, Claude, Gemini, Copilot, Perplexity, Grok, plus self-hosted options like Ollama and LM Studio.

Live site: AITool.watch

It answers the specific, annoying questions that come up when you're facilitating, teaching, or advising:

Which plan unlocks Agent Mode in ChatGPT?
Can I use Claude Cowork on Windows?
Is Gemini Live free or paid?
What open models can I realistically run locally?

Every answer links to an evidence source. Nothing is vibes-based.

How the data stays current

This is the part I'm most proud of, and probably the part that matters most for whether you'd trust it.

Every Monday & Thursday, a four-model verification cascade runs. Gemini, Perplexity, Grok, and Claude each cross-check every tracked feature: pricing, platform availability, status, gating, regional restrictions. To prevent provider bias, models are skipped when verifying their own platform's features (Gemini doesn't check Google features, etc.). A change only gets flagged when at least three models agree on a discrepancy. Flagged changes become GitHub issues or PRs for human review. Nothing auto-merges.

Features are also stamped with a Checked date. Anything not re-verified within seven days is treated as stale and gets prioritized in the next run.

How it's built

There is no database. Every piece of data lives in plain markdown files under data/. A single build script reads those files and renders the static site into docs/.

That's the whole stack: markdown, JavaScript, and Git.

Contributing doesn't require a dev environment, an ORM, or a running database. Edit a .md file, open a PR, CI rebuilds the site. If you can read a markdown table, you can read and fix the data.

# Clone, build, open
git clone https://github.com/snapsynapse/ai-tool-watch.git
cd ai-tool-watch
node scripts/build.js
open docs/index.html

What the ontology looks like

The reference is organized capability-first, not product-first. Instead of "here's everything ChatGPT does," it asks "which products let me do X?"

Capabilities are grouped into plain-language categories:

Understand
Respond
Create
Work With My Stuff
Act for Me
Connect
Access Context

Each capability maps to specific implementations across products, with plan-level availability for each.

The data also includes ready-to-use talking points for presentations (click-to-copy), category and price tier filtering, provider toggles, permalinks, and shareable URLs with filter state preserved in parameters.

Scope decisions

Covered: major consumer-facing AI products with meaningful public usage. Commercially available systems that ordinary people can sign up for and use. Important self-hosted model families and runtimes.

Not covered: every enterprise AI vendor, infrastructure platform, or niche model release. The roughly 1% market share heuristic is a practical inclusion filter, not a strict cutoff.

Prices are in USD. Feature availability reflects the US region by default.

Accessibility

WCAG 2.1 AA target. Full keyboard navigation (arrow keys, j/k, Enter to copy), skip links, focus indicators, 4.5:1 contrast minimums in both themes, reduced-motion support, 44px minimum touch targets, ARIA live regions, semantic HTML throughout.

If you care about accessibility tooling, there's a companion project: skill-a11y-audit that automates WCAG audits as a reusable AI skill.

Who this is for

Facilitators and AI educators who need current, accurate answers during live sessions
Professionals building AI literacy programs who need to know what's actually available at each price point
Designers and developers evaluating which AI capabilities exist on which platforms
Anyone tired of checking six different pricing pages to answer one question

Get involved

The repo is MIT-licensed. If you spot outdated info or want to add a feature:

Edit the relevant record under data/
Preserve the evidence source link
Run node scripts/validate-ontology.js
Open a PR

GitHub repo | Live site | Contributing guide | Knowledge-as-Code

Most LCP Fixes Come Down to One Image

nosyos — Thu, 16 Apr 2026 14:00:00 +0000

A Next.js app with next/image on every image component. Lighthouse image audit: no issues. LCP: 4.2 seconds. The hero was a CSS background-image. next/image doesn't touch those. Nobody had checked what the LCP element actually was.

Find your LCP element before you do anything else

This is the step most people skip. They add next/image, run Lighthouse, see green checkmarks on the image audit, and wonder why LCP is still slow.

Open Chrome DevTools, run Lighthouse, and look at what it marks as the LCP element. If it's a background-image set via CSS, the browser can't preload it the same way it handles a real <img> tag, and it won't get early fetch priority. Move it to an <img> element. This one change has fixed more LCP problems than anything else I've seen.

`fetchpriority="high"` is doing more work than most developers realize

The browser assigns fetch priority based on what it finds during the initial HTML parse. Images discovered late — inside components that render after hydration, or below the fold at first scan — get normal or low priority. By the time the browser decides to fetch them, the LCP window is already closing.

For your LCP image, you want the fetch to start as early as possible.

<img
  src="/hero.webp"
  fetchpriority="high"
  width={1200}
  height={600}
  alt="..."
/>

In Next.js, the priority prop on next/image sets this automatically and also injects a preload link into <head>:

<Image
  src="/hero.webp"
  priority
  width={1200}
  height={600}
  alt="..."
/>

Don't use priority on more than one or two images per page. Telling the browser everything is urgent means nothing is.

next/image defaults will silently hurt your LCP

next/image lazy-loads by default. That means if your LCP image is rendered via next/image without priority, the browser is intentionally delaying its fetch until the image is about to enter the viewport.

I've seen this cause regressions on otherwise well-optimized pages. The format is correct, the dimensions are explicit, Lighthouse scores are green — but LCP is 300ms slower than it should be because someone forgot priority. It doesn't throw a warning. It just quietly loads late.

For any image that could be the LCP element on any viewport — hero images, above-the-fold product shots, article cover images — set priority. Default to it rather than remembering to add it.

Explicit dimensions are non-negotiable

A browser that doesn't know an image's dimensions reserves no space for it. When the image loads, content shifts. That's a CLS problem, not just a performance one — it makes the page feel broken to users even if the load time is acceptable.

next/image will warn you when dimensions are missing. Every other <img> in your codebase that doesn't go through next/image should have explicit width and height set too. It takes ten seconds and prevents a class of layout bugs entirely.

Format: stop overthinking it

WebP is 25–35% smaller than JPEG at equivalent quality. AVIF is another 20–30% on top of that. next/image serves AVIF to browsers that support it and falls back to WebP automatically — you don't need to configure anything.

The format switch matters, but once you're serving WebP, the gains from AVIF are marginal compared to getting fetchpriority right on the LCP element. Fix the priority first.

Optimizing once isn't enough

Lighthouse confirms the fix on your machine. It doesn't tell you whether it holds under real conditions — actual devices, varied networks, CDN behavior on cold loads.

Measuring from real users is the only way to know:

new PerformanceObserver((list) => {
  const lcp = list.getEntries().at(-1)?.startTime;
  if (lcp) sendMetric({ metric: 'LCP', value: lcp, page: location.pathname });
}).observe({ type: 'largest-contentful-paint', buffered: true });

The harder problem is that optimizations regress. A new developer adds a hero image without priority. Someone replaces an <img> with a CSS background. The LCP you fixed at 1.6s quietly climbs back to 3.2s after the next deploy, and nobody notices until a user mentions it. If you want to catch that within minutes rather than days, I built RPAlert for exactly this — it handles the LCP monitoring and alerting layer for React apps, collecting field data from real browsers and posting to Slack or Discord when thresholds are crossed. Worth setting up after you've done the optimization work, so the gains actually stick.

The fix is almost always the same: find the LCP element, make it a real <img> tag, set fetchpriority="high", give it explicit dimensions. Everything else is secondary.

CodeClone b5: structural review that finally knows what your tests cover

orenlab — Thu, 16 Apr 2026 13:52:28 +0000

In earlier posts, I wrote about why I built CodeClone, why I exposed it through MCP for AI agents, and how b4 turned it into a real review surface for VS Code, Claude Desktop, and Codex.

b5 is the release where structural review stops being a parallel universe to your test suite.

Until now, CodeClone could tell you that a function is long, complex, duplicated, or coupled to everything — but it had no idea whether that function was covered by a single unit test. That mattered more than I wanted to admit. A complex function with a 0.98 coverage ratio is not the same risk as the identical function with 0.0. A reviewer knows this. An AI agent reading an MCP response doesn't — unless the tool tells it.

So b5 fixes that, and while doing it, also lifts a few other things that kept getting in the way:

typing and docstring coverage as first-class review facts
public API drift as a baseline-governed signal
intentionally-duplicated test fixtures stop polluting health and CI gates
a much clearer triage story for MCP and IDE clients
a rebuilt HTML report with unified filters and cleaner empty states
a Claude Desktop launcher that actually picks the right Python
a warm-path benchmark that now tells the truth

Let me walk through what changed and why.

1. Bring your `coverage.xml` into the review

The headline feature of b5 is Coverage Join. Point CodeClone at any Cobertura XML produced by coverage.py, pytest-cov, or your CI and it fuses test coverage into the same run that produces clone groups, complexity, cohesion, and dead code:

codeclone . --coverage coverage.xml --coverage-min 50 --html

What comes out is not "new coverage tool, please delete the old one." It's coverage used as a modifier on structural review:

Each function in the current run gets a factual coverage ratio.
Functions below the threshold show up as coverage hotspots with their complexity and caller count alongside.
High-risk findings can now read "complex + uncovered + new vs baseline" instead of just "complex."
A new gate, --fail-on-untested-hotspots, fails CI on below-threshold functions only where the coverage report actually measured them.

That last distinction is the part I care about most.

2. Honest about scope: measured vs out-of-scope

The easy mistake when bolting coverage onto a second tool is to silently treat "function missing from coverage.xml" as "function is uncovered." It makes the dashboard look busier, but it's a lie — the function might be covered by a coverage run that was filtered to a different package, or it might be a module the coverage config excluded on purpose.

b5 keeps these two cases cleanly separate:

Coverage hotspots — code that coverage.xml measured and reported below threshold. This is a hard signal.
Coverage scope gaps — code present in your repo but not in the coverage XML at all. This is a scoping observation, not a verdict.

Both show up in the report and through MCP, but with different meanings. In mixed monorepos this stops being cosmetic very fast.

None of this changes clone identity, fingerprints, or NEW-vs-KNOWN semantics — the baseline model is untouched. Coverage Join is a current-run fact, not baseline truth.

3. Typing and docstring coverage are now part of the picture

I used to expose "typing coverage" and "docstring coverage" as optional toggles. In practice, nobody turned them on, and they kept hiding behind flags that felt vestigial.

b5 removes the toggles and just collects adoption coverage whenever you run in metrics mode:

parameter annotation coverage
return annotation coverage
public docstring coverage
explicit Any count

They land in the main CLI Metrics block, in the HTML Overview, in MCP summaries, and in the baseline. And they get their own CI gates:

codeclone . \
  --min-typing-coverage 80 \
  --min-docstring-coverage 60 \
  --fail-on-typing-regression \
  --fail-on-docstring-regression

The regression gates are the interesting pair: they don't force you to reach a specific threshold, they just fail CI when adoption drops compared to your trusted baseline. That tends to be more realistic for real codebases where you're migrating gradually.

4. Public API drift becomes a first-class signal

Another thing that used to live outside the review surface: "did this PR break the public API?"

b5 adds an opt-in API Surface layer that takes a snapshot of your public symbols — modules, classes, functions, their parameters and return types — into the metrics baseline. Subsequent runs produce a baseline diff with explicit categories: additions, breaking changes, everything else.

# Record the snapshot
codeclone . --api-surface --update-metrics-baseline

# Guard PRs
codeclone . --fail-on-api-break

It's not a type checker and it's not SemVer enforcement. It's "the set of externally-callable names in this package just changed in a way that is likely to break downstream users, please confirm." For libraries that's the thing you want CI to block on.

Private symbols are classified separately from public ones, so moving an internal helper around doesn't pollute the diff.

5. Golden fixtures stop showing up as debt

Some repositories — including CodeClone itself — intentionally keep duplicated golden fixtures to lock report contracts and parser behavior. Those clones are real. They are also not live review debt.

b5 adds a project-level policy for exactly that case:

[tool.codeclone]
golden_fixture_paths = ["tests/fixtures/golden_*"]

Clone groups fully contained in those paths are:

excluded from the health score
excluded from CI gates
excluded from active findings
still carried in the report as suppressed facts

So the tool stays honest — you can still see the suppressed groups in the HTML Clones tab and in the canonical JSON — without making CI noisier than it needs to be. If a group stops being "fully inside the fixture paths," it stops being suppressed automatically.

6. Triage that says what it's actually looking at

MCP summary and triage payloads in b5 include a few compact interpretation fields that turned out to matter a lot for both AI agents and humans:

health_scope — is this number repository-wide, production-only, or for a specific focus?
focus — what does "new findings" actually mean for this run?
new_by_source_kind — of the new findings, how many are in production code vs tests vs tooling?

The net effect is that an agent asking "is this PR risky?" no longer has to guess whether "3 new findings" means "three new bugs in production" or "three new flake-prone tests." The payload tells it directly. The VS Code extension uses the same fields to explain repository-wide health, production focus, and outside-focus debt without widening the review flow.

The extension also now surfaces Coverage Join facts in its overview when the connected server supports them, and the optional in-IDE help topics are gated by server version so they stay honest about what's actually available.

7. The HTML report got a proper rebuild

b4 made the HTML report useful. b5 makes it feel finished.

Unified filters popover — Clones and Suggestions share the same filter UX: one button, one menu, an active-filter count, keyboard dismiss. Every control lives in the same place on every tab that has filters. No more two-row filter strips that wrap on narrow screens.
Cleaner empty states — instead of empty tables, sections with no findings now render a single reassuring row with an explicit "no issues detected" message and an icon. Silence has meaning now.
Coverage Join subtab — Quality gets a dedicated Coverage Join view with per-function rows: coverage %, complexity, callers, source kind, and a clear marker for scope gaps.
Adaptive theme toggle — the theme button shows a sun in light mode and a moon in dark mode, resolved at paint time so you don't flash the wrong icon on first load.
Refreshed palette — the whole report moved to a chromatic neutral scale tinted toward the brand indigo, so surfaces, borders, and text live on the same hue axis instead of looking like "grayscale + one purple button."
Better provenance — the meta block makes it explicit which python tag the baseline was built for, and calls out baseline mismatches instead of hiding them.
Stat-card rhythm — KPI cards across Overview, Quality, Dependencies, and Dead Code share one card component now. Same padding, same typography, same tone variants.

None of that changes a single report contract. It's pure render-layer work.

8. Claude Desktop launches the right Python

A boring but high-impact b5 change: the Claude Desktop bundle now resolves your project's runtime before falling back to a global one. Poetry's .venv, workspace .venv, and an explicit workspace_root override all come before anything on PATH.

Before: installing CodeClone into your project, then launching it via Claude Desktop, would often run some other CodeClone from /usr/local/bin because that happened to be first on PATH. That's fixed.

If you've been getting subtly wrong results through Claude Desktop and couldn't explain why, this is the one to pull.

9. Safer and more deterministic under the hood

Two changes that are unglamorous but worth noting:

Git diff ref validation. When you use --diff-against, the supplied revision is now validated as a safe single-revision expression before being passed to git. No shell surprises, no accidental multi-ref expressions.
Canonical segment digests. Segment clone digests no longer use repr() — they're computed from canonical JSON bytes. This closes a subtle determinism hole where two runs on different interpreters could, in rare cases, produce different segment digests for the same input.

Neither changes clone identity or fingerprint semantics.

10. The warm path is actually warm

One of the more satisfying b5 fixes wasn't a feature at all.

I'd been quietly suspicious of the benchmark numbers for a while — warm runs were looking too good, and I couldn't make the shape of the curve match what the pipeline was actually doing. Turns out the benchmark harness had a bug that broke process-pool execution on warm runs, so the cache was being credited for work it wasn't doing.

After fixing the harness and tightening gating around benchmark runs so repo quality gates don't interfere, the numbers are now both fast and trustworthy. From the Linux smoke benchmark:

cold_full: 6.58s
warm_full: 0.95s
warm_clones_only: 0.86s

About 6.9× speedup on warm runs. The cache is no longer "probably helping" — it is clearly doing useful work, and now I can say that with a straight face.

Wrapping up

If b4 made CodeClone a real review surface, b5 is the release where that surface learned to ask useful second-order questions:

Is this complex function actually tested?
Is this low-coverage number a hard signal or a scope gap?
Is this new finding in production code or in fixtures?
Did this PR break the public API?
Is this duplication intentional test scaffolding or real debt?

Every one of those used to require me to eyeball two dashboards and a coverage report. Now there's a single canonical answer, and it ships consistently through CLI, HTML, JSON, SARIF, MCP, the VS Code extension, the Claude Desktop bundle, and the Codex plugin.

Try it

# Base install
uv tool install --pre codeclone

# With MCP for AI agents (Claude Desktop, Codex, VS Code, Cursor, ...)
uv tool install --pre "codeclone[mcp]"

A one-liner to feel the new shape on your own repo:

codeclone . \
  --coverage coverage.xml --coverage-min 70 \
  --min-typing-coverage 80 --fail-on-typing-regression \
  --api-surface --fail-on-api-break \
  --html

Open the HTML report, watch the Coverage Join tab populate, and check whether your "risky" functions really were the risky ones.

Feedback, issues, and PRs welcome on GitHub.

System Design: проектируем сервис заказа такси

NowInterview — Thu, 16 Apr 2026 13:51:55 +0000

Перевод на русский язык статьи Design Uber

Видеоразбор этой задачи на русском языке можно посмотреть здесь - https://youtu.be/R9B90ewl9EY

Постановка задачи

🚗 Что такое Uber?

Uber - платформа для заказа такси, которая связывает пассажиров и
водителей. Она позволяет пассажирам заказать такси со смартфона,
подбирая ближайшего водителя неподалеку, который доставит их из
места нахождения в желаемое место назначения.

Функциональные требования

В начале интервью определите функциональные и нефункциональные
требования. Для пользовательских приложений функциональные
требования - это формулировки вида “Пользователь может…”, а
нефункциональные - это характеристики системы вида “Система
должна…”.

Приоритизируйте 3-4 ключевых функциональных требования. Все
остальные требования показывают, что вы обладаете продуктовым
мышлением, но явно обозначьте это “за рамками задачи”, чтобы
интервьюер понимал, что эти пункты не входят в дизайн. Уточните,
не хочет ли интервьюер увеличить/уменьшить приоритет какого-то
требования. Выбор только 3-4 требований помогает оставаться
сфокусированным и уложиться во временные рамки интервью.

Основные требования

Пассажиры могут указать начальное и конечное местоположение и получить стоимость поездки.
Пассажиры могут заказать поездку.
После запроса пассажира система подбирает доступного водителя поблизости.
Водители могут принять/отклонить запрос.

За рамками задачи

Пассажиры могут оценивать поездку после завершения, а водители могут оценивать пассажиров.
Пассажиры могут заранее планировать поездки.
Пассажиры могут выбирать категории поездок (например, Эконом, Комфорт).

Нефункциональные требования

Основные требования

Система должна обеспечивать высокую скорость подбора водителя (< 1 минуты до принятия запроса или отказа).
Система должна обеспечивать сильную согласованность при подборе водителя, чтобы одному водителю не назначались несколько поездок одновременно.
Система должна выдерживать высокую нагрузку, особенно в пиковые периоды или во время популярных событий (100k запросов в секунду из одной локации).
Масштабирование - 100 млн DAU, 15 млн поездок в день

За рамками задачи

Система должна обеспечивать безопасность и приватность данных пользователей и водителей, соблюдая требования государственных регуляторов.
Система должна быть отказоустойчивой, с механизмом аварийного восстановления.
Система должна иметь мониторинг, логирование и уведомления для быстрого обнаружения проблем.

На доске это может выглядеть примерно так:

Описание требований за рамками задачи показывает продуктовое
мышление и дает интервьюеру возможность переопределить
приоритеты. Но это все же необязательная вещь, если
дополнительные идеи не приходят в голову сразу, не тратьте время
и двигайтесь дальше.

Подготовка

Планирование подхода

Прежде чем переходить к проектированию системы, важно на секунду остановиться и продумать стратегию. К счастью, для “продуктовых” задач план обычно простой: последовательно собирать дизайн, проходя по функциональным требованиям одно за другим. Так вы сохраните фокус и не утонете в деталях.

Когда функциональные требования удовлетворены, используйте нефункциональные требования, чтобы определить направления для погружения в детали, где это необходимо.

Проектирование API

Начнем с определения основных сущностей, это поможет спроектировать API. Пока не обязательно знать каждое поле или колонку, но если у вас уже есть представление о том, что там будет - можно это записать.

Для основных функциональных требований понадобятся следующие сущности:

Rider (Пассажир): пользователь, который запрашивает поездку. Содержит личные данные, контактную информацию, способы оплаты и т. п.
Driver (Водитель): пользователь, зарегистрированный как водитель. Содержит личные данные, информацию о машине (марка, модель, год), предпочтения и статус доступности.
Fare (оценка стоимости): оценка стоимости поездки. Содержит точки старта и назначения, цену и ожидаемое время поездки. Эту информацию также можно просто хранить в сущности Ride, но пока мы оставим ее отдельно (здесь нет правильного или неправильного ответа).
Ride (Поездка): запись о поездке от момента запроса стоимости до завершения. Содержит информацию о пассажире и водителе, машине, состоянии поездки, маршруте, конечной стоимости, а также временные метки посадки и высадки.
Location (Местоположение): актуальная позиция водителей с координатами и временем обновления. Эта сущность является ключевой для подбора водителя и отслеживания поездки.

В реальном интервью достаточно короткого списка как выше - главное проговорить сущности и убедиться, что вы и интервьюер одинаково их понимаете.

API для получения оценки стоимости достаточно простой. Определим POST эндпоинт, который принимает текущую локацию и пункт назначения, и возвращает объект Fare с оценкой цены и времени поездки. Мы используем POST, потому что создаем новую запись о поездке в базе данных.

POST /fares -> Fare
Body: {
  pickupLocation,
  destinationLocation
}

Эндпоинт заказа поездки: после того как пользователь увидел оценку, он подтверждает поездку. Этот эндпоинт инициирует процесс подбора водителя и создает новую запись Ride.

POST /rides -> Ride
Body: {
  fareId
}

На этом этапе мы сопоставляем пассажира с доступным водителем поблизости. Этот процесс происходит на стороне сервера, поэтому отдельный эндпоинт не нужен.

Эндпоинт обновления местоположения водителя: чтобы подобрать водителя нужно знать, где он находится в данный момент. Этот эндпоинт вызывается клиентом водителя регулярно, чтобы держать его местоположение актуальным, обновляя базу данных.

POST /drivers/location -> Success/Error
Body: {
  lat, long
}

// заметим, что driverId берется из сессии или auth-токена и не 
// передается в теле или параметрах пути запроса

Всегда учитывайте безопасность API. Часто кандидаты передают в
тело запроса userId, метки времени или даже оценку стоимости.
Это красный флаг для интервьюера: любые данные от клиента можно
подделать. Пользовательские данные должны приходить из сессии или
auth-токена, метки времени должны генерироваться на сервере, а
оценку стоимости нужно получать из базы данных.

Эндпоинт принятия заказа: водитель принимает заказ, после чего система обновляет статус поездки и возвращает координаты точки посадки.

PATCH /rides/:rideId -> Ride
Body: {
  accept/reject
}

Объект Ride должен содержать информацию о точках посадки и назначения, чтобы
клиент водителя мог отобразить ее в интерфейсе.

Высокоуровневый дизайн

1. Пассажиры могут указать начальное и конечное местоположение и получить стоимость поездки

Первое что делает пассажир - отправляет запрос на стоимость поездки, указав точку назначения.

Соберем минимальный набор компонентов для расчета стоимости, добавив первый сервис - сервис поездок:

Основные компоненты для оценки стоимости:

Клиент пассажира: мобильное приложение на смартфоне пассажира, которое взаимодействует с бэкендом.
API-шлюз: точка входа для запросов от клиентов, отвечает за маршрутизацию, аутентификацию, ограничение запросов и т.д.
Сервис поездок: управляет состоянием поездки, начиная с расчета стоимости. Он взаимодействует со сторонними картографическими API для определения расстояния и времени в пути между точками и применяет модель ценообразования компании для расчета стоимости проезда. Для целей данного интервью мы абстрагируемся от деталей этого алгоритма.
Сторонний сервис Maps API: сторонний картографический API сервис (например, Google Maps) для расчета расстояния и времени в пути.
База данных: сохраняет объекты Fare.

Рассмотрим как эти компоненты взаимодействуют когда пассажир запрашивает стоимость поездки:

Пользователь вводит начальное и конечное местоположение и отправляет POST запрос на /fares.
API-шлюз принимает запрос, проверяет аутентификацию и ограничения, и маршрутизирует его в сервис поездок.
Сервис поездок запрашивает картографический API для получения расстояния и времени и вычисляет стоимость поездки.
Сервис поездок сохраняет объект Fare в базе данных.
Fare возвращается через API-шлюз, и пользователь решает, делать ли заказ.

2. Пассажиры могут заказать поездку

После получения стоимости и времени поездки пользователь заказывает поездку. Это действие просто расширяет существующий дизайн - мы добавляем таблицу rides.

Когда заказ на поездку приходит мы обрабатываем его следующим образом:

Пользователь заказывает поездку, отправляя POST запрос с fareId.
API-шлюз после проверок отправляет запрос в сервис поездок.
Сервис поездок создает запись Ride, ссылаясь на оценку стоимости Fare, и устанавливает для поездки статус requested.
Затем запускается процесс подбора водителя (см. ниже).

3. После запроса пассажира система подбирает доступного водителя поблизости

Для реализации механизма подбора водителя в наш дизайн необходимо добавить несколько новых компонентов:

Клиент водителя: принимает запросы на поездки и отправляет обновления локации в сервис локаций.
Сервис локаций: принимает обновления локаций, сохраняет их в базу данных.
Сервис подбора водителя: обрабатывает запросы на новые поездки и выбирает оптимального водителя (по близости, рейтингу и другим факторам).

Водители постоянно (например, раз в 5 секунд) отправляют свое текущее местоположение в сервис локаций, и мы обновляем базу данных с указанием их последнего местоположения по широте и долготе. Сервис подбора водителей использует эти данные когда приходит запрос на новую поездку для поиска оптимального соответствия.

4. Водители могут принять/отклонить запрос

Как только водитель будет сопоставлен с пассажиром, он сможет принять запрос на поездку. Добавим в дизайн новый компонент:

Сервис нотификаций: Отвечает за отправку уведомлений в режиме реального времени водителям, когда им подобран новый запрос на поездку. Уведомления отправляются через APN (Apple Push Notification) и FCM (Firebase Cloud Messaging) для устройств iOS и Android соответственно.

Последовательность событий при этом следующая:

Сервис подбора водителя формирует список подходящих водителей и отправляет уведомление первому в списке через APN/FCM.
Водитель открывает приложение и принимает запрос, отправляя PATCH запрос с rideId. Если водитель отклоняет запрос, сервис уведомляет следующего.
API Gateway маршрутизирует запрос в сервис поездок.
Сервис поездок обновляет статус поездки на accepted, устанавливает для поездки driverId и возвращает водителю координаты точки посадки.
Водитель использует GPS своего клиента, чтобы построить маршрут до точки посадки.

Интервьюер ожидает push‑уведомления водителям? Разбор паттерна
Обновления в реальном времени охватывает опции от
long‑polling до SSE и WebSockets.

Потенциальные погружения в детали

Когда основные функциональные требования закрыты, мы можем перейти к нефункциональным требованиям, углубляя наш дизайн там, где это необходимо.

Насколько глубоко кандидат должен погружаться в детали зависит от > уровня. Для Middle кандидатов нормально, если интервьюер ведет
большую часть обсуждения. Для Senior и Staff+ ожидается больше
инициативы: кандидат сам видит проблемы в дизайне и предлагает
решения.

1. Как обрабатывать частые обновления локаций водителей и эффективный поиск по близости?

Управлять потоком обновлений локаций и выполнять быстрые запросы на поиск по локации сложно, и текущий high-level дизайн с этим не справляется. Есть две основные проблемы:

Высокая частота записей: если у нас около 5 млн водителей и они отправляют локации каждые 5 секунд, это ~1 млн обновлений в секунду. Независимо от того, выберем ли мы что-то вроде DynamoDB или PostgreSQL (оба являются отличным выбором для остальной части системы), они либо не выдержат такую нагрузку, либо их придется масштабировать настолько, что они станут слишком дорогими.
Эффективность запросов: без оптимизаций запросы по координатам (proximity search) требуют полного сканирования таблицы и вычисления расстояния до каждого водителя. Даже с B‑tree индексами это плохо работает для многомерных данных вроде координат.

Что можно сделать, чтобы разобраться с этими проблемами?

Плохое решение: Прямая запись в базу и proximity‑поиск

Подход

Плохое решение - это наш текущий high-level дизайн: записывать каждое обновление локации в базу и выполнять proximity‑поиск по этим сырым данным. Этот подход плохо масштабируется из‑за высокой частоты обновлений и делает proximity‑поиск неэффективными и медленными. Этот метод приведет к перегрузке системы, высокой
задержке и ухудшению пользовательского опыта, что сделает его непригодным для приложения масштаба Uber.

Хорошее решение: Пакетная обработка и специализированная гео‑база

Подход

Вместо записи каждого обновления напрямую в базу мы агрегируем обновления за небольшой интервал времени и записываем их пакетами. Это снижает количество операций записи, а также повышает пропускную способность записи и уменьшает количество конфликтов.

Для поиска ближайших водителей используем специализированную геопространственную базу данных с индексами, например на основе деревьев квадрантов (quadtrees).
Деревья квадрантов особенно хорошо подходят для двумерных пространственных данных, таких как географические координаты, поскольку они рекурсивно делят пространство на квадранты, что значительно ускоряет proximity‑поиск.

Если использовать PostgreSQL, у него есть расширение
PostGIS, которое позволяет использовать
геопространственные типы и функции без необходимости отдельного хранилища.

Проблемы

Интервал пакетных записей приводит к задержке: данные о локациях становятся слегка устаревшими, а это ведет к ухудшению качества подбора водителей.

Отличное решение: In‑memory гео‑хранилище реального времени

Подход

Мы можем устранить ограничения предыдущих решений, используя in‑memory хранилище вроде Redis, которое поддерживает геопространственные типы и команды. Это позволяет нам обрабатывать обновления местоположения водителей в режиме реального времени и эффективно выполнять proximity-поиск, одновременно минимизируя затраты на хранение за счет автоматического истечения срока действия данных.

Redis использует geohashing для кодирования широты и долготы в единое строковое значение, которое хранится в отсортированных множествах.

Redis предоставляет специализированные команды, такие как
GEOADD и GEOSEARCH, которые эффективно обрабатывают обновления в реальном времени и proximity‑поиск. Команда GEOSEARCH, которая появилась в Redis 6.2, заменяет и расширяет функциональность старых команд GEORADIUS и GEORADIUSBYMEMBER, давая больше гибкости и улучшая производительности.

Пакетная обработка больше не нужна: Redis справляется с большим потоком обновлений в реальном времени. Кроме того, Redis автоматически удаляет данные на основе заданного времени жизни (TTL), что позволяет нам сохранять только самые последние обновления местоположения и избегать ненужных затрат на хранение.

Проблемы

Главная проблема этого подхода - надежность. Поскольку Redis хранит все данные в памяти (in‑memory), возможны потери данных при сбое. Однако эти риски можно смягчить несколькими способами:

Redis persistence: мы можем включить механизмы сохранения Redis, такие как RDB (Redis Database) или AOF (append-only file), чтобы периодически сохранять данные в памяти на диск.
Redis Sentinel: мы можем использовать Redis Sentinel для обеспечения высокой доступности. В случае выхода из строя главного узла Sentinel обеспечивает автоматическое переключение на реплику.

Даже при потере данных ущерб минимален: локации обновляются каждые 5 секунд, и система быстро восстанавливает состояние.

2. Как снизить перегрузку из‑за частых обновлений локаций без потери точности?

Частые обновления локаций перегружают сеть и серверы, что может замедлять работу системы и ухудшать пользовательский опыт. Большинство кандидатов предлагают обновлять локацию водителя каждые 5 секунд или около того. Можем ли мы разумно уменьшить количество обновлений, сохраняя при этом точность?

Отличное решение: Адаптивные интервалы обновлений

Подход

Мы можем решить эту проблему, внедрив адаптивные интервалы обновления локаций, которые динамически регулируют частоту обновления в зависимости от таких факторов как скорость, направление движения, близость к ожидающим запросам на
поездку и статус водителя.

Приложение водителя использует датчики устройства и определенные алгоритмы для определения оптимального интервала. Если водитель стоит или движется медленно - обновления могут отсылаться реже. И наоборот, если водитель движется быстро или часто меняет направление, обновления отправляются чаще.

Проблемы

Основная сложность этого подхода - корректно построить эффективный алгоритм определения оптимальной частоты обновления. Он может потребовать тщательного тестирования в несколько итераций. Но если все сделать правильно, это значительно сократит количество обновлений и повысит эффективность системы.

Не пренебрегайте клиентом, думая о своем дизайне. У многих
кандидатов появляется привычка рисовать маленький прямоугольник
"клиент" и двигаться дальше. Во многих случаях нам нужна логика
на стороне клиента для повышения эффективности и масштабируемости
нашей системы. Как вы видели, мы можем уменьшить количество
обновлений, используя встроенные датчики и алгоритмы для
определения оптимального интервала их отправки. Аналогичным
образом, для сервиса загрузки файлов клиент отвечает за разбитие
на куски и сжатие.

3. Как предотвратить назначение нескольких поездок одному водителю?

Мы определили сильную согласованность при подборе водителя как ключевое нефункциональное требование. Это означает что каждый заказ посылается на рассмотрение только одному водителю, И один водитель в каждый момент времени имеет только один заказ на рассмотрении. У водителя есть 10-15 секунд на принятие/отклонение заказа, после чего система переходит к следующему водителю. Если вы рассматривали задачу проектирования сервиса бронирования билетов, это очень
похоже, поскольку мы гарантируем что билет продается только один раз, и он зарезервирован на определенное время при оформлении заказа.

Плохое решение: Блокировка на уровне приложения и проверка таймаута

Подход

Основная идея заключается в том, что нам нужно заблокировать водителей, чтобы предотвратить одновременную отправку нескольких запросов на поездку одному и тому же водителю. Один из подходов - использовать блокировку на уровне приложения, при которой каждый экземпляр сервиса подбора водителя помечает запрос на поездку как "locked" при его отправке водителю. Затем он запускает таймер на время блокировки. Если водитель не принимает поездку в течение этого периода, сервер снимает блокировку и делает запрос доступным для других водителей.

Проблемы

У этого подхода несколько проблем:

Отсутствие координации: при работе нескольких экземпляров сервиса подбора водителя централизованная координация отсутствует, что приводит к потенциальным состояниям гонки, когда два экземпляра могут одновременно попытаться заблокировать один и тот же запрос на поездку.
Несогласованное состояние блокировки: если один экземпляр устанавливает блокировку и отказывает перед ее снятием (из-за сбоя или проблемы с сетью), другие экземпляры не знают об этом, что может оставить запрос на поездку в заблокированном состоянии на неопределенный срок.
Проблемы масштабирования: по мере увеличения количества экземпляров проблема координации блокировок между ними становится более явной, что приводит к более высокой вероятности ошибок и несогласованностей.

Хорошее решение: Блокировка через статус в базе данных и таймаут

Подход

Чтобы решить проблему координации, мы можем переместить блокировку в базу данных. Это позволяет нам использовать встроенные транзакционные возможности базы данных, чтобы гарантировать, что только один экземпляр может одновременно заблокировать запрос на поездку. Когда мы отправляем запрос водителю, мы обновляем статус этого водителя на "outstanding_request". Если водитель принимает запрос, мы обновляем статус на "accepted", а если отклоняет, мы
обновляем статус на "available". Затем мы можем использовать простой механизм таймаута в сервисе поездок, чтобы гарантировать, что блокировка будет снята, если водитель не ответит в течение 10 секунд.

Проблемы

Хотя мы решили проблему координации, мы по-прежнему сталкиваемся с проблемами, связанными с использованием таймаута в памяти для разблокировки, если водитель не отвечает. Если сервис поездок выйдет из строя или будет перезапущен, таймаут будет потерян, а блокировка останется на неопределенный срок. Это распространенная проблема с таймаутами в памяти, и причина их избегать, когда
это возможно. Одним из решений является создание cron-задания, которое будет периодически запускаться для проверки наличия блокировок с истекшим сроком действия и их снятия. Это будет работать, но добавляет ненужную сложность и задерживает разблокировку запроса на поездку.

Отличное решение: Распределенная блокировка с TTL

Подход

Чтобы решить проблему таймаута, мы можем использовать распределенную блокировку, реализованную с помощью in-memory хранилища, такого как Redis. Когда водителю отправляется запрос на поездку, создается блокировка с уникальным идентификатором (например, driverId) и TTL = 10 секунд. Сервис подбора
водителей пытается получить блокировку driverId в Redis. Если блокировка успешно получена, это означает, что ни один другой экземпляр сервиса не сможет отправить запрос на поездку тому же водителю до тех пор, пока не истечет срок действия блокировки или она не будет снята. Если водитель соглашается на поездку
в течение 10 секунд, сервис подбора водителя обновляет статус поездки на "accepted" в базе данных, и блокировка снимается в Redis. Если водитель не соглашается на поездку, блокировка в Redis немедленно снимается и водитель становится доступным для новых запросов на поездку.

Проблемы

Основная проблема этого подхода - зависимость системы от доступности и производительности Redis. Нам нужны надежные стратегии мониторинга и аварийного переключения, чтобы гарантировать, что система может быстро восстановиться после
сбоев и что блокировки не будут потеряны.

4. Как гарантировать, что запросы поездок не теряются в пиковые периоды?

В периоды пиковой нагрузки система может получать большое количество запросов на поездки, которые мы не сможем обработать и они будут отклонены. Например, это часто происходит во время особых мероприятий или праздников, когда спрос резко вырастает. Нам также необходимо защититься от случаев, когда один из серверов сервиса подбора водителя выходит из строя или перезапускается, что не должно приводить к потере запросов на поездки.

Плохое решение: Без очереди

Подход

Самый простой подход - обрабатывать запросы на поездки по мере их поступления без какой-либо системы очередей (как это сделано в текущем дизайне).

Проблемы

Основная проблема этого подхода заключается в том, что он плохо масштабируется в периоды высокой нагрузки. По мере увеличения количества входящих запросов и перегрузки система начинает отбрасывать запросы, которые не может обработать, что приводит к ухудшению пользовательского опыта. Мы можем горизонтально
масштабировать наш сервис подбора водителей, но при внезапном всплеске спроса мы не сможем масштабироваться достаточно быстро, чтобы полностью предотвратить потерю запросов.

Кроме того, если один из экземпляров сервиса выходит из строя, все запросы на поездки, обрабатываемые этим экземпляром, будут потеряны. Это приведет к тому, что пассажиры будут бесконечно ждать подбора, который так и не случится.

Отличное решение: Очередь и динамическое масштабирование

Подход

Чтобы решить эту проблему, мы можем добавить очередь, куда попадает запрос на поездку. Сервис подбора водителей обрабатывает запросы из очереди в порядке их поступления и может масштабироваться горизонтально в зависимости от размера очереди. Этот подход также позволяет гарантировать, что ни один запрос не будет отброшен или потерян. Мы также можем разделить очереди по географическим регионам для дальнейшего повышения эффективности.

Мы могли бы использовать распределенную очередь сообщений, такую как Kafka, которая позволяет нам подтверждать обработку сообщения только после того, как мы успешно подобрали водителя. Таким образом, если экземпляр сервиса подбора выйдет из строя, запрос на поездку все равно будет находиться в очереди, и его подберет другой
экземпляр. Такой подход гарантирует, что ни один запрос на поездку не будет потерян при сбое.

Проблемы

Основная проблема этого подхода - добавленная сложность. Нам необходимо обеспечить масштабируемость, отказоустойчивость и высокую доступность очереди. Мы можем решить эту проблему, используя managed сервис очередей, такой как Amazon SQS или Kafka, который предоставляет требуемые характеристики "из коробки". Это позволяет нам сосредоточиться на бизнес-логике нашей системы, не
беспокоясь об инфраструктуре.

Еще одна проблема в том, что обработка некоторых запросов может занимать много времени, блокируя другие "более быстрые" запросы. Это распространенная проблема с очередями FIFO, и ее можно решить, используя очередь с приоритетом. Это позволит нам определять приоритетность запросов на основе таких факторов, как
близость водителя, рейтинг водителя, класс поездки и так далее.

5. Что делать, если водитель не отвечает вовремя?

Наша система прекрасно работает, когда водители либо принимают, либо отклоняют заявку на поездку. Но если водитель сделал перерыв и не реагирует на запросы, мы должны гарантировать, что запрос на поездку будет продолжать обрабатываться, перенаправляя запрос следующему водителю.

Процессы которые требуют реакции или действий от человека часто
сигнализируют, что > мы столкнулись с паттерном Многошаговые
процессы. На самом деле, Uber является первоначальным автором
проекта с открытым исходным кодом Cadence, который лег в
основу Temporal - системы надежного исполнения, созданную
специально для таких случаев.

Хорошее решение: Очередь с задержками

Подход

Мы можем реализовать очередь с задержками, чтобы автоматически повторять запросы на поездку со следующим доступным водителем, если текущий водитель не отвечает в течение таймаута. Когда запрос на поездку отправляется водителю, мы одновременно планируем отложенное сообщение в очереди (например, Amazon SQS позволяет добавить сообщение с таймаутом видимости, в нашем случае 10 секунд). Отложенное сообщение содержит сведения о запросе и водителе, с которым первоначально связались. При обработке отложенного сообщения система проверяет, не назначена ли еще поездка. Если это так, запрос автоматически переходит к следующему водителю, одновременно планируя еще одно отложенное сообщение для нового
водителя и так далее.

Проблемы

И опять сложность - основная проблема такого подхода. Если водитель соглашается на поездку, нам необходимо убедиться, что отложенное сообщение обрабатывается корректно и не приводит к неправильному переназначению поездки. Кроме того, этот подход требует тщательной координации между очередью и сервисом подбора водителей, чтобы обеспечить согласованность и избежать состояний гонки.

Отличное решение: Надежное исполнение (durable execution)

Подход

Эти системы обеспечивают встроенную поддержку таймаутов, повторных попыток и управления состоянием таким образом, чтобы выдерживать сбои и перезапуски сервисов. Весь процесс подбора водителя моделируется как workflow, который может обрабатывать сложную бизнес-логику, при этом постоянно сохраняет свое состояние,
поэтому даже в случае сбоя процесс можно возобновить с того места, где он был остановлен.

Например, Temporal workflow может выглядеть так:

Отправляем запрос первому водителю.
Устанавливаем таймаут на 10 секунд.
Если водитель принимает - завершаем workflow.
Если водитель отклоняет или таймаут истекает - автоматически переходим к следующему водителю.
Продолжаем пока водитель не найден или список водителей не исчерпан.

Проблемы

И опять мы добавляем дополнительную сложность, внедряя систему оркестрации workflow. Это требует от инженеров изучения новых концепций и инструментов и добавляет в систему еще один компонент, который необходимо мониторить и обслуживать.

Однако преимущества гарантированного выполнения, встроенной отказоустойчивости и упрощенной бизнес-логики часто перевешивают эти проблемы, особенно для критически важных систем, где отброшенные запросы напрямую влияют на финансовые показатели и удобство пользователей.

6. Как дальше масштабировать систему, снижая задержку и повышая пропускную способность?

Плохое решение: Вертикальное масштабирование

Подход

Самый простой путь - вертикальное масштабирование, при котором мы увеличиваем мощность существующих серверов, добавляя больше CPU, памяти или дисков. Это быстрый и простой способ увеличить емкость, но он имеет ряд ограничений.

Проблемы

Это решение плохое по многим причинам. Во-первых, это дорого и требует простоя для обновления серверов. Во-вторых, мы не сможем вертикально масштабироваться бесконечно. Наконец, это решение не является отказоустойчивым. Если сервер выйдет из строя, вся система выйдет из строя. На интервью обсуждать этот вариант вряд ли стоит, поскольку для системы такого масштаба он непрактичен.

Отличное решение: Гео-шардирование и реплики чтения

Подход

Лучшим подходом является горизонтальное масштабирование путем добавления дополнительных серверов. Мы можем сделать это, разделив наши данные по географическому принципу и используя реплики чтения для повышения пропускной способности чтения. Важно отметить, что это не только позволяет нам масштабироваться, но и снижает задержку за счет уменьшения расстояния между клиентом и сервером. Все компоненты системы: сервисы, очереди сообщений и базы данных можно шардировать географически. Единственный случай, когда нам
понадобится межрегиональное вычисление (например, запрос по нескольким шардам), - это когда мы выполняем proximity-поиск на границе нескольких шардов.

Проблемы

Главная сложность - правильное управление шардированием. Нам необходимо гарантировать, что данные распределяются равномерно по шардам и что система может обрабатывать сбои и выполнять перебалансировку. Мы можем решить это, используя согласованное хеширование для распределения данных по шардам и реализуя стратегию репликации для повышения отказоустойчивости.

Итоговая архитектура нашей системы может выглядеть примерно так:

Что ожидается на каждом уровне?

Хорошо, мы обсудили много всего. Возникает резонный вопрос: "сколько из этого реально ожидается от меня на интервью?" Разберем по уровням.

Middle

Ширина vs глубина: от Middle кандидата чаще ожидается ширина кругозора и знаний (примерно 80% vs 20%). Вы должны собрать понятный высокоуровневый дизайн, закрывающий все функциональные требования, но многие компоненты могут оставаться абстракциями, которые вы проработали и обсудили с интервьюером на поверхностном уровне.

Проверка базовых знаний: интервьюер будет прощупывать базу, чтобы удостовериться, что вы понимаете, что делает каждый компонент. Например, добавив API Gateway, ожидайте вопрос "что он делает" и "как работает".

Смешанный формат ведения: вы должны уверенно вести ранние стадии интервью, но не обязательно проактивно находить все проблемы дизайна. Нормально, если позже интервьюер будет вести обсуждение, задавая вопросы и ставя дополнительные задачи.

Задача Uber: от Middle кандидата ожидается четко определенный API и модель данных, а также высокоуровневый дизайн покрывающий функциональные требования. Кандидат должен указать на необходимость использования гео-пространственного индекса для ускорения поиска по местоположению, а также реализовать, по крайней мере, "хорошее решение" проблемы блокировки запроса на поездку.

Senior

Глубина экспертизы: от Senior кандидата ожидания смещаются к глубине - примерно 60% ширины и 40% глубины. Нужно уметь уходить в детали там, где у вас есть практический опыт.

Продвинутый дизайн системы: вы должны быть знакомы с современными принципами проектирования систем: различными технологиями, вариантами их использования и тем, как они сочетаются друг с другом.

Аргументация решений: вы должны уметь ясно объяснять плюсы/минусы архитектурных решений и их влияние на масштабирование, производительность и поддерживаемость, проговаривая компромиссы.

Проактивность и решение проблем: вы должны продемонстрировать сильные навыки решения проблем и проактивный подход. Это подразумевает обнаружение потенциальных проблем в ваших проектах и предложение улучшений. Вам необходимо уметь выявлять и устранять узкие места, оптимизировать производительность и обеспечивать надежность системы.

Задача Uber: от Senior кандидата ожидается, что вы быстро пройдете высокоуровневый дизайн и потратите время на детальное обсуждение как минимум двух из проблем: ускорение proximity-поиска, проблему блокировки запроса на поездку или проблему пиковых нагрузок. Вы также должны быть в состоянии обсудить плюсы и минусы различных вариантов архитектуры, особенно то, как они влияют на
масштабируемость, производительность и удобство обслуживания.

Staff+

Акцент на глубину: от Staff+ кандидата ожидается глубокий разбор нюансов - примерно 40% ширины и 60% глубины. Важна демонстрация того, что, даже если вы не решали именно эту задачу раньше, вы решали достаточно похожих задач в реальном мире, чтобы уверенно спроектировать решение, опираясь на опыт.

Интервьюер понимает, что вы знаете основы (REST, нормализация данных и т. п.), так что вы можете быстро пройти это на high-level дизайне и перейти к самому интересному.

Высокая проактивность: на этом уровне ожидается, что вы будете
самостоятельно выявлять и решать проблемы. Это предполагает не только реагирование на проблемы по мере их возникновения, но и их прогнозирование и реализацию упреждающих решений.

Практическое применение технологий: важно уметь говорить о применяемых технологиях не только в теории, но и как это делается на практике - конфигурации, эксплуатационные нюансы, типичные проблемы.

Решение проблем: ожидаются сильные навыки решения проблем с учетом факторов масштабирования, производительности, надежности и поддерживаемости.

Задача Uber: от Staff+ кандидата ожидается высокое качество решений по сложным проблемам, которые обсуждались выше. Хорошие кандидаты глубоко погружаются как минимум в 3+ ключевых области, демонстрируя не только профессионализм, но и инновационное мышление и способности находить оптимальные решения. Хорошим показателем вашей экспертизы является то, что интервьюер завершает дискуссию, обретя новое понимание или точку зрения.

Разборы задач по System Design:

Проектируем Ticketmaster, сервис бронирования билетов

AI Doesn't Fix Weak Engineering. It Just Speeds It Up.

Jono Herrington — Thu, 16 Apr 2026 13:50:47 +0000

"Weak engineers with AI still produce weak output. Just faster." That was the whole point. AI changes speed. Not judgment. If your team already struggled to make sound architectural decisions, the tool doesn't rescue them. It just helps them make more bad decisions faster. The same gaps. Compressed into a tighter window.

I was on the phone with a friend who runs a CMS platform. We were talking about AI adoption across his customer base when he cut through the hype in ten seconds.

"Sh*t in, sh*t out," he said. "AI doesn't solve the decades of issues that distributed teams present."

That was it. The conversation shifted. He'd been watching companies make the same bet ... ship work to lower-rate markets with the expectation that AI would cover the gap. The tool doesn't fix coordination problems. It doesn't fix unclear ownership. It doesn't fix architectural decisions that get revisited every three months because nobody ever aligned on the tradeoffs.

AI just produces output faster. Good or bad, it comes out faster.

Speed Without Foundation

My friend sees the pattern across his customer base. Companies that struggled with architectural decisions before AI haven't found a shortcut. They've found a way to compress the same gaps into a tighter window. The teams that were already shipping inconsistent patterns, unclear ownership boundaries, and technical debt that accumulates silently ... those teams are now doing all of that faster.

If your team already struggled to make sound architectural decisions, AI doesn't rescue them. It just helps them make more bad decisions faster.

I've seen this pattern enough times now to recognize it. Teams adopt the tooling, see initial velocity gains, and mistake speed for health. The metrics look good for a sprint or two. Then the accumulated weight of unchecked decisions starts showing up. Refactors that should have been caught in review. Patterns that diverged across the codebase. Technical debt that formed silently because everyone was moving too fast to notice.

The tool didn't create the problem. It revealed how little structure was there to begin with.

The Judgment Gap

What separates teams that thrive with AI from teams that struggle isn't the AI. It's judgment.

Teams with strong judgment can evaluate what the model produces. They know their patterns. They understand their tradeoffs. They can look at generated code and recognize when it fits and when it's a mismatch. AI becomes a force multiplier for people who already know what good looks like.

Teams without that judgment can't evaluate what they're getting. They're outsourcing decisions they never learned to make themselves. The result isn't better engineering. It's faster execution of uncertain choices.

Teams without judgment can't evaluate what they're getting. They're outsourcing decisions they never learned to make themselves.

This is the uncomfortable truth about AI tooling in engineering. It doesn't level the playing field. It steepens the curve. The gap between teams with strong technical judgment and teams without it gets wider, not narrower. The strong teams move faster and build better. The weak teams move faster and build more of what they already had.

The Oracles We Build

I was the oracle on a team once.

Decisions ran through me. The projects that worked were the ones I was close to. I read that as signal that I was adding value. It was actually proof that I'd built dependency, not capability. The engineers weren't deferring to me because my judgment was better. They were deferring because I had never built a culture where their judgment was tested. When I stepped back, the decisions didn't get easier. They just got slower and more uncertain.

That same pattern is what worries me about AI tooling in weak engineering cultures. When you stop making decisions yourself, you stop building the judgment that lets you evaluate decisions made by others. Including decisions made by models.

A senior engineer told me a story that still sits with me. He had spent years building systems, switched to mostly directing AI agents, then later hit a production memory issue and realized the instinct to debug was gone. Not degraded. Gone.

When ChatGPT arrived, teams like the one I used to run had an obvious replacement oracle. Different interface. Same problem underneath.

What Actually Matters

The teams that thrive with AI have done the work before the tool arrived. They don't need AI to tell them what good looks like. They already know.

They have clear standards. Not just lint rules and style guides ... real standards that describe how decisions get made, what tradeoffs matter, when to follow the pattern and when to break it. Standards that live in documentation and in practice. The same person can explain why something was built that way and why it shouldn't have been. That's the sign of a healthy standard.

They have review culture that interrogates before approving. Reviews that ask "why" before checking the boxes. That create space for pushback without making it personal. Where junior engineers can question senior decisions and senior engineers can admit when they missed something. The authority isn't in the title. It's in the reasoning.

The teams that thrive with AI have done the work before the tool arrived. They don't need AI to tell them what good looks like. They already know.

They have engineers who can defend decisions in their own words. Not quote a recommendation. Not cite a benchmark someone else ran. Construct the argument. Weigh the tradeoffs. Say "here's what I considered, here's what I chose, here's what I'm watching to know if I was wrong." That capability is what makes AI output useful instead of dangerous.

The Work Before The Tool

If you're leading a team that's adopting AI tooling, the question to ask isn't about usage rates or productivity metrics. It's about judgment.

Can your engineers evaluate what the model produces? Do they have the framework to recognize a good recommendation from a bad one? Can they explain why they're accepting or rejecting what AI suggests, or are they just accepting what looks plausible?

The work that matters happens before anyone opens the tool. It's the standards you set. The review culture you build. The time you spend teaching engineers to think instead of just execute. AI doesn't replace any of that. It requires it.

AI doesn't replace the work of building judgment. It requires it.

I had that moment myself with Cursor. Opened it, used it for ten minutes, shut it down. The suggestions arrived faster than I could evaluate them. Every keystroke generated a new option to consider, a new pattern to question, a new decision to make. It wasn't helping. It was flooding.

Later I recognized what that was. Not that AI was bad. That I needed to be clearer about what I was looking for before I could use it well. The teams that will thrive in this transition are the ones who recognize that same signal.

That's The Real Question

My friend on the phone wasn't worried about whether companies were using AI. He was worried about what they were expecting it to fix. Decades of coordination problems don't disappear because the tool got better.

AI doesn't fix weak engineering. It just speeds it up.

The question for every team is whether that's something you want. Whether your foundation can handle the acceleration. Whether your engineers can evaluate faster without losing the thread of what actually matters.

If they can, AI is a multiplier. If they can't, it's just faster output of the same problems you already had.

That's the conversation worth having. Not whether to use AI. Whether you're ready for what it will amplify.

One email a week from The Builder's Leader. The frameworks, the blind spots, and the conversations most leaders avoid. Subscribe for free.

The Setup Is the Strategy: How I Orchestrated a Product Migration with Claude Code

Karthik Subramanian — Thu, 16 Apr 2026 13:49:55 +0000

Most engineers using Claude Code are getting a fraction of its value. Not because the tool isn't capable — but because they're using it out of the box, unconfigured, the way you'd use a new IDE without installing extensions or setting up your build system. The default experience is decent. The configured experience is transformative.

I'm a Senior Software Engineering Manager leading a team focused on leveraging AI to find acceleration opportunities across the software development lifecycle. I've been playing around with AI tooling for a while and could see its potential, so I proposed a proof of concept: take a real product migration — the kind of project that would normally require a team of engineers across multiple sprints — and attempt it solo, using Claude Code as my primary development platform. Not "AI-assisted development" An AI-first model where every phase of the SDLC runs through Claude Code's feature set.

The migration itself was substantial:

4 legacy repositories consolidated into 2,
a database migration from MySQL to PostgreSQL,
framework upgrades across the full stack (Spring Boot 2 to 3, Java 17 to 21, React 17 to 18),
an authentication model replacement,
and complete test suites for everything.

What made this work wasn't Claude Code itself. It was how I set it up.

The skills that made me effective at leading engineering teams — providing clear context, delegating with specificity, reviewing rigorously, building repeatable processes — turned out to be exactly the skills that make Claude Code most effective. Engineers whose day-to-day revolves around writing code by hand sometimes struggle with this shift because the instinct is to do the work yourself, not to set up the system and direct it. I'd spent years not writing the code myself. I was already an orchestrator. The medium changed, the model didn't.

This post walks through how I configured Claude Code for a real migration, organized as building blocks. Each layer depends on the one below it. Skip the foundation and the rest falls apart.

Building Block 1: Planning — Laying the Foundation

Before any code was written, I built the context and planning infrastructure. This is the layer most people skip, and it's the layer that matters most.

CLAUDE.md — The Master Context

Claude Code reads CLAUDE.md files on every session start. They're your project memory — the equivalent of onboarding documentation for a new team member. I built a multi-level hierarchy:

Workspace-level (181 lines): overview of all 25+ repositories, tech stack summary, cross-service architecture, shared build commands
Domain-level: 18-service catalog, glossary of domain terms, JWT authentication architecture, service communication patterns
Per-repo: service-specific conventions, module structure, testing standards, local dev setup

Each level inherits from its parent. When Claude Code opens a session in any repo, it automatically has the full context stack — from the broadest architectural view down to the specific service conventions.

This is the single highest-leverage thing you can configure. A well-written CLAUDE.md prevents Claude from re-exploring your codebase every session, asking questions you've already answered, or making assumptions that contradict your architecture. It's free, it's immediate, and it compounds over time.

The Memory System

Claude Code has a persistent memory system — files that survive across sessions. I built 17 memory files organized by type:

User: who I am, my role, what I'm working on
Feedback: corrections that become permanent rules
Project: active work context, ticket maps, architectural decisions
Reference: pointers to external systems (Confluence pages, Jira boards, SonarQube dashboards)

The feedback memories are the most powerful. Every time I corrected Claude — "don't amend commits, it breaks CI and MR reviews" or "always fix all test failures before committing, even seemingly pre-existing ones" or "copy application-local.yml to worktrees because it's gitignored" — that correction became a permanent rule. One-time mistakes became permanent automation. After a few weeks, the memory system had captured dozens of workflow-specific rules that would take a new team member months to internalize.

The Context Funnel

For the planning phase itself, I fed Claude Code everything it would need to design the migration:

The four legacy repositories (full source access)
Dev database connections (read-only) to both MySQL and PostgreSQL
A reference implementation — a similar product that had already been migrated to the platform
Live Swagger documentation from upstream services (router, rostering, authentication APIs)

With this context loaded, I used the brainstorming skill (from the superpowers plugin) to generate the migration design across all eight repositories simultaneously. The skill enforces a structured process: explore context, ask clarifying questions, propose approaches with trade-offs, present the design for approval, then write a spec document.

I also used agent teams — an experimental Claude Code feature that runs parallel reviewers with independent context windows — to stress-test the design. Three independent agents reviewed the same architecture and caught issues a single pass missed:

Missing resume logic for interrupted user flows (a legacy endpoint had been removed without accounting for in-progress sessions)
Frontend state invalidation gaps in the data fetching layer
Unnecessary network hops that could be eliminated now that previously separate services lived in the same JVM

Giving AI direct database access allowed exact column-by-column mapping between the legacy and target schemas. It caught DDL mismatches — timestamp type differences, nullable column discrepancies, default value conflicts — that ORM annotations hide. Without this, the migration would have hit runtime errors that are painful to debug after the fact.

Atlassian MCP + Custom Jira Skill — From Design to Tickets

This is where the Atlassian MCP enters the story. It connects Claude Code directly to Jira and Confluence — no browser, no context switching.

First, the design became documentation: 20+ Confluence pages generated directly from Claude Code via MCP. Design documents, use case specifications, system architecture diagrams — all created and published without leaving the terminal. That said, this is where I hit my first major failure. The Atlassian MCP's updateConfluencePage tool silently truncates content beyond ~5KB. I asked Claude to update two design documents — 37KB and 46KB — and both were overwritten with partial content. I had to manually restore them from Confluence's page history. The data loss was real. I immediately encoded a memory rule: never update large Confluence pages via MCP, only add comments. Lesson learned the hard way.

Then came the decomposition. This was one of the most powerful things I did: I tasked Claude with breaking the architecture into Jira tickets scoped for three constraints:

Reviewable code reviews: no 2,000-line merge requests that reviewers rubber-stamp. Each ticket's scope had to produce a merge request a human could meaningfully assess.
QA throughput: QA can't test a monolithic "migrate everything" ticket. Each ticket needed to be independently testable with clear acceptance criteria.
Parallel development: tickets needed clean boundaries so multiple could be in-flight simultaneously without merge conflicts.

The result: 19 Jira tickets created in a single session from the design docs. Each with acceptance criteria in Atlassian Document Format, story points, sprint assignment, and a parent epic link. But it couldn't link them — the MCP tool for creating issue links between tickets throws a "not found" error. I had to go into Jira manually and add the "is blocked by" relationships myself. Not everything is automatable yet.

The Jira API has other quirks that would bite you every session without the right setup. So I built the my-jira skill — a custom skill file that encodes all the workarounds:

createJiraIssue renders newlines as literal \n text. The skill enforces a follow-up editJiraIssue call to fix formatting.
Story points live in customfield_10058, not the obvious-looking field. The wrong field silently saves to the wrong place — you'd never know until someone checks the sprint board.
QA testing note templates, project constants, sprint IDs, assignee account IDs — all encoded in one place.

One skill file eliminated an entire class of silent failures.

Building Block 2: Execution — The Development Loop

With the plan in place and tickets created, execution begins. Each ticket follows the same cycle: design doc, plan doc, execute, review, iterate, merge, move to QA. The tools enter the story as the workflow demands them.

Per-Ticket Planning

Every ticket — no matter how small — gets two documents before any code is written:

Design doc (what + why): the problem being solved, the approach, the constraints
Plan doc (how + steps): every file to change, every migration rule, every commit interval

I review the plan before execution starts. This is the gate. The AI generates, I validate. I know the domain, the constraints, the edge cases that don't show up in code. This is where the orchestrator model is most visible: I'm not writing plans by hand, but I'm reading every one and catching the things that only domain knowledge reveals.

Worktrees — Parallel Execution

Each ticket executes in a dedicated git worktree. Claude Code's /execute-plan skill runs the plan step-by-step in an isolated working directory.

At peak, I had 8 active worktrees across 2 repositories — 4 tickets developed concurrently. The tickets depended on each other, but the worktree model let me develop them in parallel and rebase with --onto as dependencies merged upstream. All four hit QA in the same sprint.

The practical limit: about 4 active Claude Code sessions at a time, depending on how many contexts you can keep in your head. You're reviewing output from multiple streams, making judgment calls, and keeping the overall architecture coherent. It's project management, not coding.

GitLab CLI + The `manage-mr` Skill

Merge requests aren't just git push. There's the strategy description, the pipeline to monitor, quality gates to check, and the fix-push-recheck cycle when something fails.

The manage-mr skill wraps the full lifecycle:

Create the MR with a description derived from the plan doc
Monitor the CI pipeline with /loop on a recurring interval
Check SonarQube quality gates
If anything fails: read the failure, trace to source, fix, re-push

The /loop skill deserves its own mention. It runs a command on a configurable interval — I used it to poll CI pipelines. Pipeline fails? Claude reads the build log, traces the error to the source file, applies a fix, pushes, and the loop continues. No browser, no manual checking.

One recurring failure pattern worth mentioning: AI would sometimes aggressively remove "unused" state variables without checking the callbacks that referenced them, breaking CI. It also missed secondary integration tests that asserted on the removed behavior. The fix was straightforward each time, but the pattern recurred enough that I added a memory rule: "verify all references before removing anything." The pipeline loop caught these quickly, but they shouldn't have happened in the first place.

SonarQube MCP

Connected via MCP server, Claude can query pull request issues, check quality gates, and fix vulnerabilities directly from the terminal. The migration shipped with 95%+ API coverage, 91% frontend coverage, zero bugs, zero vulnerabilities, and zero security hotspots at the time of QA handoff.

Chrome DevTools MCP

For frontend work, Claude needs to see the actual rendered application — not just the code. The Chrome DevTools MCP connects Claude to a live browser session. I log in, navigate to the page, and Claude inspects the live DOM, console errors, and network requests.

This is a game-changer for UI work. It finds CSS/layout bugs, missing state updates, and rendering issues that code-level analysis and screenshots could never surface. Claude can see what the user sees.

Figma MCP

During frontend porting, Claude references Figma designs directly via MCP. No screenshotting, no describing layouts in words. It reads the design context — component structure, spacing, colors, typography — and translates to code. This kept the ported UI faithful to the design without the constant back-and-forth of "does this match the mockup?"

Postman MCP

The API collection stays in sync with endpoint changes. Test scripts auto-chain with dynamic JWT extraction. This matters because QA depends on Postman to validate the API — if the collection is stale, they're blocked. The MCP integration ensures the collection reflects the latest API state at all times.

The `move-to-qa` Skill

When a ticket is ready for QA, there's a ritual: add structured testing notes (environment, credentials, test steps, caveats), transition the ticket to the QA column, notify the QA channel. Getting any step wrong means the QA engineer wastes time asking clarifying questions.

The move-to-qa skill encodes the entire handoff as a single invocation. One command handles the comment (in the exact template QA expects), the Jira transition, and the notification. Consistent handoff, every time, no steps skipped.

Keeping Everything in Sync

Here's where the orchestration model really pays off. When implementation changes a design decision or uncovers a requirement gap, Claude updates Confluence and Jira in the same session — with the caveat that large page edits go through comments, not full page updates (see the Confluence truncation lesson above). The old friction — finish code, open browser, update Jira, update Confluence — is the kind of manual chore that developers frequently skip. Now it's one prompt: "Update the design doc to reflect that we're using a materialized view instead of a join, and comment the change on the Jira ticket."

Documentation stays in sync because it's part of the workflow, not an afterthought.

Building Block 3: Review & Iteration

Code Review Plugins

The code-review and pr-review-toolkit plugins run multi-agent PR reviews with confidence-based scoring. They're effective as a first-pass filter:

Caught syntax and formatting bugs
Off-by-one errors in date filters
Missing transactional annotations
Raw data leakage in error responses

AI caught roughly 30-40% of issues — the low-to-mid-level stuff that humans miss under time pressure.

But the high-level stuff still needed a human reviewer: permission architecture that needed refactoring, structural design decisions for domain enums, Java stream filtering optimizations, missing API documentation annotations. These aren't bugs — they're design-level improvements that require understanding the system's intent, not just its syntax. AI review is a floor-raiser, not a ceiling-raiser. It catches what slips through, but it doesn't replace architectural judgment.

Responding to Review Comments

When human reviewers leave comments on merge requests, Claude reads and responds via the GitLab integration. Implement the requested changes, push, and the review loop continues — all from the terminal. The reviewer doesn't know or care that the fixes were AI-assisted. They just see responsive, well-reasoned changes.

The Cost

Honest accounting matters. Over the course of the migration — roughly 15 work days from planning through QA handoff — I spent close to $5,000 in API token costs running Claude Code through AWS Bedrock.

For a migration of this scope — four repos, database migration, full-stack framework upgrades, auth model replacement, ~50 tickets, ~580 tests — that's a fraction of what the engineering time alone would cost with a traditional team across multiple sprints.

Lessons learned on cost management:

Clear your context frequently. This is the single biggest cost lever. Claude Code caches your conversation context and re-reads it on every turn. Long marathon sessions accumulate enormous cache read and cache write charges that dwarf the actual input/output token costs. Use /compact aggressively, and prefer multiple shorter focused sessions over all-day marathons.
Use the right model for the job. Opus for planning, architecture, and complex reasoning. Sonnet for routine execution, test generation, and boilerplate. Haiku for quick lookups. The cost difference between models is significant and most execution work doesn't need the most powerful model.
The cost is front-loaded. The initial setup — CLAUDE.md files, skills, MCPs, memory rules, the design phase — was the most token-intensive period. Once configured, subsequent tickets were dramatically cheaper because the context was already built and the workflows were encoded.

Closing

By the end of the POC, the migration had produced roughly 50 tickets, 580 passing tests, 95%+ API coverage, and zero bugs, vulnerabilities, or security hotspots at QA handoff. One engineer, half a month, one tool — configured deliberately for every phase of the work.

The throughput increase didn't come from AI writing better code. It came from iterating faster, verifying more thoroughly, and managing parallel execution streams.

The building blocks made this possible:

Planning: CLAUDE.md gave the AI context. Memory gave it institutional knowledge. The brainstorming skill gave it structure. The Atlassian MCP and custom Jira skill turned designs into documentation and actionable, well-scoped tickets.
Execution: worktrees enabled parallel development. Skills encoded repeatable workflows. MCPs connected Claude Code to every system in the SDLC — version control, CI/CD, code quality, design tools, project management, documentation.
Review: plugins raised the floor on code review quality. Human reviewers caught the architectural and design-level issues that AI can't.

Each layer depends on the one below it. Skip the foundation — the CLAUDE.md files, the memory, the skills — and the execution layer produces mediocre results. That's what most people experience. They skip straight to "write me some code" without investing in the setup, get underwhelming output, and conclude the tool isn't useful.

The setup is the strategy.

If you're starting from zero, here's my recommendation: write your CLAUDE.md first. Just the basics — tech stack, project structure, build commands, conventions. Then add one MCP integration for the system you context-switch to most often (probably your issue tracker). Then build one custom skill for your most repeated workflow. Build up from there. Each layer makes the next one more effective.

What UK Businesses Get Wrong About GDPR and Phone Calls

Dialphone Limited — Thu, 16 Apr 2026 13:49:53 +0000

Most UK businesses think GDPR applies to emails and websites. They forget that phone calls generate personal data too — and the compliance gaps are enormous.

After auditing GDPR telephony compliance for 40 UK organisations, here are the violations I find in almost every one.

Violation 1: Recording Calls Without Proper Legal Basis (85% of businesses)

You cannot record calls just because you want to. You need a legal basis under Article 6 of UK GDPR:

Legal Basis	When It Applies	What You Must Do
Consent	Customer agrees to recording	Play announcement AND get explicit consent
Legitimate interest	Training, quality, dispute resolution	Document your LIA, offer opt-out
Legal obligation	Financial services (FCA requirement)	Document the specific regulation
Contract performance	Recording needed to fulfil contract	Document how recording serves the contract

The common mistake: Playing "this call may be recorded" and assuming that is consent. It is not. An announcement is notification, not consent. For consent, you need affirmative agreement ("press 1 to agree to recording").

The better approach: Use legitimate interest as your legal basis. Document a Legitimate Interest Assessment (LIA) covering: purpose (training, quality, disputes), necessity (cannot achieve purpose without recording), balance (your interest vs caller's privacy). Most businesses can justify recording under legitimate interest without needing per-call consent.

Violation 2: No Data Retention Policy for Recordings (72% of businesses)

Recordings are personal data. Under GDPR Article 5(1)(e), personal data must not be kept longer than necessary.

Industry	Recommended Retention	Regulatory Requirement
General business	6-12 months	None (but justify your choice)
Financial services	5-7 years	FCA MiFID II
Healthcare	8 years	NHS records management
Legal	6 years after matter closes	SRA guidelines
Insurance	3 years	FCA guidelines

The common mistake: Keeping recordings forever because storage is cheap. This violates the storage limitation principle. You must define a retention period and automatically delete recordings after it expires.

Violation 3: No Process for Subject Access Requests (68% of businesses)

Anyone can request copies of their call recordings under Article 15. You have 30 days to respond.

The test: Call your phone system administrator and say: "A customer has requested all recordings of calls with them from the past 12 months. How quickly can you produce them?"

If the answer involves manually searching through thousands of recordings, you are not compliant. You need:

Search by phone number
Search by date range
Export in standard format (MP3/WAV)
Ability to redact third-party data from multi-party calls

Violation 4: Voicemail Transcriptions Not Treated as Personal Data (55% of businesses)

Voicemail-to-email transcriptions contain personal data (caller's name, phone number, message content). They are stored in email servers, potentially backed up to multiple locations, and rarely covered by the data retention policy.

The fix: Include voicemail transcriptions in your data retention policy. Auto-delete transcription emails after the defined retention period.

Violation 5: Call Data Shared Without DPA (48% of businesses)

Your VoIP provider processes personal data on your behalf (call recordings, CDRs, voicemail). Under Article 28, you must have a Data Processing Agreement (DPA) in place.

Check	Compliant	Non-Compliant
DPA signed with VoIP provider	Document on file	No DPA exists
DPA covers all data types	Recordings, CDRs, voicemail, transcriptions	Only mentions "calls" vaguely
Sub-processor list provided	Provider discloses all sub-processors	"We use cloud infrastructure" (no specifics)
Breach notification clause	Provider notifies within 72 hours	No breach notification terms

The GDPR Telephony Checklist

[ ] Legal basis for recording documented (LIA or consent mechanism)
[ ] Recording announcement configured and playing
[ ] Data retention policy includes call recordings AND voicemail transcriptions
[ ] Automatic deletion after retention period
[ ] SAR process documented and tested (can you find recordings by phone number?)
[ ] DPA signed with VoIP provider
[ ] Sub-processor list obtained from provider
[ ] Breach notification clause in DPA (72-hour timeline)
[ ] Staff trained on handling recording-related SARs
[ ] Privacy notice updated to mention call recording

DialPhone provides a signed DPA with every UK customer, configurable retention policies with automatic deletion, searchable recording archives for SAR compliance, and a sub-processor list published transparently. Because GDPR compliance should be built into the phone system, not bolted on afterwards.

Generate Realistic Mock JSON Data for API Testing (No Backend, No Setup)

Jean Carlo - Dev — Thu, 16 Apr 2026 13:48:33 +0000

When you're building or testing an API, you hit the same wall every
time:

"I just need realistic JSON data... fast."

You don't want to: - create seed scripts - install faker libraries -
spin up a backend - or manually type fake users, products, or orders

You just want JSON.

ߑ https://quickeasy.tools/en/tools/json-mock-generator

The problem with most mock data approaches

Typical flow:

npm install faker
create script
run script
adjust fields
run again

Or worse... copy/paste from old projects.

This breaks your flow when you're: - prototyping a frontend - testing
API contracts - mocking responses - creating Postman collections -
writing automated tests

You don't need a project.\
You need data.

What this JSON Mock Generator does

You define fields like:

id (UUID)
full_name
username
email
phone
address
company
job_title
active (boolean)

And instantly get:

{
  "id": "e24e0d8f-a5e-4c13-8ec5-0af008884f",
  "full_name": "Kevin Wilson",
  "username": "kevin.wilson",
  "email": "kevin.wilson@email.com",
  "phone": "+1 (555) 709-795",
  "address": {
    "street": "497 Maple Dr",
    "city": "Dallas",
    "zip": "52278"
  },
  "company": "Lorem ipsum dolor sit amet",
  "job_title": "DevOps",
  "active": false
}

No setup. No login. No libraries.

Perfect for

Frontend developers mocking API responses
Backend developers testing serializers
QA creating test payloads
Postman / Insomnia testing
Writing unit tests
Prototyping dashboards
Generating seed-like data instantly

Custom schemas (this is the killer feature)

You're not stuck with "users".

You can build any structure:

products
orders
invoices
customers
logs
nested objects

Add fields. Choose types. Generate.

Why browser-based is better

Because you don't leave your flow.

No: - npm - docker - scripts - dependencies

Just open → configure → copy JSON.

Example use cases

Mock API response for frontend

fetch('/api/users')
  .then(res => res.json())

Paste generated JSON into your mock server.

Unit tests

const fakeUsers = require('./mock.json')

Done.

Postman collection

Paste into response body simulation.

Types supported

UUID
Full name
Username
Email
Phone
Address
Company
Words / Lorem
Enum
Boolean

You can mix everything into your own schema.

Try it here

ߑ https://quickeasy.tools/en/tools/json-mock-generator

Takes 5 seconds to generate your first dataset.

Why I made this

I was tired of wasting time generating fake data every time I needed to
test something.

So this became my internal tool.\
Now it's public.

If you build APIs, frontends, or tests --- this saves real time.

If you have suggestions for new field types, I'm all ears.

Camunda 7 End of Life: Why More Teams Are Choosing OrqueIO as Their Migration Alternative

Ghofrane WECHCRIA — Thu, 16 Apr 2026 13:48:23 +0000

With Camunda 7 Community Edition now officially reaching end of life, many organizations are asking the same question:

What comes next for our workflow orchestration platform?

For teams heavily invested in BPMN automation and process orchestration, the obvious path may seem to be migrating to Camunda 8.
But in reality, many companies quickly discover that moving to Camunda 8 is far from a simple upgrade.

It often requires:

Major process refactoring
New infrastructure and architecture changes
Reworking integrations and deployment pipelines
Significant investment in time, budget, and training

For organizations running large-scale orchestration environments, this can represent a major technical and business challenge.

That’s exactly why we built OrqueIO.

OrqueIO: A Modern Open-Source Alternative to Camunda 7

OrqueIO is a modern, long-term supported fork of Camunda 7, designed for organizations that want to preserve their existing workflow architecture without going through a painful migration.

Instead of rebuilding everything from scratch, OrqueIO allows teams to continue running their BPMN processes and applications on a modernized, future-proof foundation.

With OrqueIO, you get:

Full compatibility with existing Camunda 7 BPMN/DMN models and APIs
Java 25, Spring Boot 4 and Angular modern stack
Continuous security updates and platform improvements
Free and open-source licensing
Optional enterprise support for production environments

In short:

OrqueIO enables you to keep your Camunda 7 investment while moving forward with a modern platform.

Built for Enterprises That Need Stability and Continuity

OrqueIO was not created as an experimental fork.

It was designed by engineers with real-world enterprise experience operating Camunda-based systems in large production environments.

Our goal was simple:

Preserve compatibility. Remove technical debt. Modernize the platform.

This means organizations can benefit from:

smoother upgrades,
lower migration risk,
reduced maintenance overhead,
and long-term platform sustainability.

More Than a Fork: A Platform Evolving Beyond Camunda 7

While compatibility remains at the core of OrqueIO, our ambition goes further.

Our roadmap includes a growing set of enterprise-ready capabilities to make orchestration smarter, simpler, and more secure:

Single Sign-On (SSO) and centralized identity management
Enhanced observability and analytics dashboards
Built-in documentation and model insights directly in the UI
Continuous UI/UX modernization across all modules

Our philosophy remains unchanged:

Keep compatibility. Add value. Evolve continuously.

The Future of Camunda 7 Doesn't Have to End Here

Camunda 7 may have reached end of life, but that does not mean organizations need to abandon their architecture or rebuild everything.

For teams looking for a Camunda 7 alternative, Camunda 7 fork, or open-source workflow engine compatible with existing BPMN applications, OrqueIO provides a pragmatic path forward.

If your company wants to modernize without rewriting years of orchestration logic, OrqueIO may be exactly what you’re looking for.

Learn more about OrqueIO: https://www.orqueio.io

DEV Community

When

The binary matching problem

What confidence scores actually are

Why this changes the workflow

Real example: vendor reconciliation

The human-in-the-loop principle

Beyond simple matching

What to look for in matching tools

The bottom line

We Scanned 200 SMB Domains. Here's What We Found.

We Scanned 200 SMB Domains. Here's What We Found.

Methodology

SSL/TLS: Better Than Expected, But Fragile

DNS & Email Security: The Worst Category by Far

HTTP Security Headers: Low-Hanging Fruit, Widely Missed

Open Ports: A Few Alarming Findings

The Overall Picture

What We Recommend

Try It Yourself

Benchmarking Memoria on LongMemEval: Strong Memory Retrieval, Clear Reader Separation

Why this benchmark matters

Experimental setup

Overall results

Category breakdown

1. Memoria is extremely strong on direct factual recall

2. The real separator is not retrieval, but reasoning over retrieved memory

3. Memoria supports strong temporal and cross-session reasoning

4. Abstention reveals calibration, not just recall

What this says about Memoria

Why the unified-judge setup matters

Limitations

Conclusion

Stop Googling 'Can I Use That AI Feature on the Free Plan'

What this is

How the data stays current

How it's built

What the ontology looks like

Scope decisions

Accessibility

Who this is for

Get involved

Most LCP Fixes Come Down to One Image

Find your LCP element before you do anything else

fetchpriority="high" is doing more work than most developers realize

next/image defaults will silently hurt your LCP

Explicit dimensions are non-negotiable

Format: stop overthinking it

Optimizing once isn't enough

CodeClone b5: structural review that finally knows what your tests cover

1. Bring your coverage.xml into the review

2. Honest about scope: measured vs out-of-scope

3. Typing and docstring coverage are now part of the picture

4. Public API drift becomes a first-class signal

5. Golden fixtures stop showing up as debt

6. Triage that says what it's actually looking at

7. The HTML report got a proper rebuild

8. Claude Desktop launches the right Python

9. Safer and more deterministic under the hood

10. The warm path is actually warm

Wrapping up

Try it

System Design: проектируем сервис заказа такси

Постановка задачи

Функциональные требования

Нефункциональные требования

Подготовка

Планирование подхода

Проектирование API

Высокоуровневый дизайн

1. Пассажиры могут указать начальное и конечное местоположение и получить стоимость поездки

2. Пассажиры могут заказать поездку

3. После запроса пассажира система подбирает доступного водителя поблизости

4. Водители могут принять/отклонить запрос

Потенциальные погружения в детали

1. Как обрабатывать частые обновления локаций водителей и эффективный поиск по близости?

2. Как снизить перегрузку из‑за частых обновлений локаций без потери точности?

3. Как предотвратить назначение нескольких поездок одному водителю?

4. Как гарантировать, что запросы поездок не теряются в пиковые периоды?

5. Что делать, если водитель не отвечает вовремя?

`fetchpriority="high"` is doing more work than most developers realize

1. Bring your `coverage.xml` into the review

GitLab CLI + The `manage-mr` Skill

The `move-to-qa` Skill