Add semantic YMYL classification via LLM extraction by remete618 · Pull Request #16 · remete618/widemem-ai

remete618 · 2026-04-15T00:29:08Z

Summary

Two-stage YMYL pipeline that catches implied health/legal/financial content and rejects false positives from metaphorical keyword usage.

Stage 1 (fast): Regex strong patterns for definitive matches (blood type, 401k, DNR order). No LLM call.

Stage 2 (smart): LLM classifies YMYL during fact extraction. Zero additional API calls; the classification piggybacks on the extraction call that already happens for every add().

What it catches that regex misses

"my chest has been hurting for three days" -> health
"I owe $40,000 and can't make payments" -> financial
"I stopped taking my pills" -> medical
"my ex is threatening to take the kids" -> legal

What it skips that regex falsely flags

"walked by the bank of the river" -> null (not financial)
"The Doctor is a great TV show" -> null (not medical)
"court of public opinion" -> null (not legal)

Files changed

core/types.py - Add ymyl_category to Fact, ActionItem, Memory
extraction/prompts.py - Updated extraction prompt with YMYL examples
extraction/llm_extractor.py - Two-stage: regex override + LLM classification
conflict/batch_resolver.py - Thread ymyl_category through actions
core/pipeline.py - Store ymyl_category in metadata, use in YMYL-triggered active retrieval
core/memory.py - Reconstruct ymyl_category from metadata at search time
retrieval/temporal.py - Use stored ymyl_category for decay immunity
tests/test_ymyl_topics.py - 8 new tests

Test plan

All 163 tests pass
New: Fact/ActionItem/Memory carry ymyl_category
New: ymyl_category stored in metadata (and omitted when null)
New: LLM-classified YMYL memories get decay immunity
New: Proves regex misses implied YMYL that LLM would catch
Backward compatible: old memories without ymyl_category fall back to regex

Two-stage YMYL pipeline: - Stage 1 (fast): Regex strong patterns for definitive matches (blood type, 401k, DNR order). No LLM call needed. - Stage 2 (smart): LLM classifies YMYL during fact extraction. Zero additional API calls (piggybacks on existing extraction). Catches implied YMYL ("my chest hurts") and rejects metaphorical usage ("bank of the river"). Changes: - Add ymyl_category field to Fact, ActionItem, and Memory models - Update extraction prompt to ask LLM for ymyl_category per fact - Thread ymyl_category through resolver -> pipeline -> metadata - Scoring uses stored ymyl_category for decay immunity (falls back to regex for memories created before this change) - 8 new tests for semantic YMYL classification

…nsion

remete618 added 2 commits April 14, 2026 20:29

Review fix: pass ymyl_category in UPDATE path, clean up list comprehe…

b50c951

…nsion

remete618 merged commit 47c3dd1 into main Apr 15, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add semantic YMYL classification via LLM extraction#16

Add semantic YMYL classification via LLM extraction#16
remete618 merged 2 commits intomainfrom
feat/semantic-ymyl

remete618 commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

remete618 commented Apr 15, 2026

Summary

What it catches that regex misses

What it skips that regex falsely flags

Files changed

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant