Validate Japanese haiku and tanka against a mora pattern. Counts moras (拍), not syllables or characters — the unit haiku actually uses. Kana-only by design. Zero runtime dependencies.
git clone https://github.com/sen-ltd/haiku-score
cd haiku-score
docker build -t haiku-score .
printf 'ふるいけや\nかわずとびこむ\nみずのおと\n' | docker run --rm -i haiku-score -The standard English explanation of haiku is "5-7-5 syllables", and it is wrong in a way that makes every beginner write bad haiku. Japanese verse doesn't count syllables; it counts moras, a unit shorter than an English syllable. The difference is load-bearing:
きゃis 1 mora, not 2. The small ゃ fuses into the previous kana (this is called youon, 拗音).とうきょうis 4 moras — と・う・きょ・う — even though English speakers often call it 2 syllables.ほんis 2 moras: the ん (hatsuon, 撥音) is its own beat.にっぽんis 4 moras: the small っ (sokuon, 促音) is a beat of silence.コーヒーis 4 moras: the 長音符 ー extends the previous vowel by one full beat.
haiku-score implements these rules and validates a pattern (default 5-7-5, or 5-7-5-7-7 with --tanka, or anything you like with --pattern). The core algorithm is about 50 lines of Python stdlib; the whole project is a crash course in basic Japanese phonology wearing a CLI costume.
# A valid haiku (Basho's 古池, in kana form).
printf 'ふるいけや\nかわずとびこむ\nみずのおと\n' | docker run --rm -i haiku-score -
# Tanka (5-7-5-7-7).
docker run --rm -i haiku-score - --tanka < poem.txt
# A custom pattern.
docker run --rm -i haiku-score - --pattern 3,5,3 < micro.txt
# Machine-readable output.
docker run --rm -i haiku-score - --format json < poem.txt
# A one-liner with Japanese punctuation as line breaks.
docker run --rm haiku-score 'ふるいけや、かわずとびこむ。みずのおと' --auto-breakhaiku-score refuses kanji input:
$ printf '古池や\n' | haiku-score -
haiku-score: kanji '古' at position 0: haiku-score counts moras on kana input.
Convert kanji to kana first (e.g. with pykakasi, or your own lookup) then re-run.This is deliberate. Kanji-to-kana is a hard problem: 日 is ひ or にち or じつ depending on context, and doing it right requires either a full morphological analyser (MeCab, Sudachi) or a large pronunciation dictionary. That's a whole different project, and once you're pulling in fugashi and a dictionary you've blown the "zero dependencies, ~50 lines of core logic" thesis. So the tool does one thing — count moras on kana input — and tells you how to convert kanji when it sees them.
If you need automatic kanji handling, pipe your text through pykakasi first:
echo "古池や" | python -c "
import sys, pykakasi
kks = pykakasi.kakasi()
for line in sys.stdin:
print(''.join(item['hira'] for item in kks.convert(line.strip())))
" | haiku-score -| Code | Meaning |
|---|---|
| 0 | Input matches the pattern. |
| 1 | Input parsed, but the mora counts don't match. |
| 2 | Bad input: kanji, empty text, wrong number of lines, bad flags. |
pip install .
haiku-score poem.txt
haiku-score - < poem.txt
haiku-score poem.txt --tanka --format jsonRequires Python 3.10+. No runtime dependencies — just argparse, json, and unicodedata from the standard library.
docker run --rm --entrypoint pytest haiku-score -qExtensive coverage of the mora rules specifically, including:
- The five vowels (
あいうえお= 5). - Youon fusion (
きゃ= 1,きゅうり= 3,しゃしん= 3). - Sokuon (
にっぽん= 4). - Hatsuon (
ほん= 2,ラーメン= 4). - Long-vowel mark (
コーヒー= 4). - Punctuation is ignored (
あ、い。う= 3). - Katakana parity (
カタカナ= 4,ジャム= 2). - Basho's 古池 haiku in kana form: 5-7-5 ✓.
- Kanji input raises
MoraErrorwith the position of the offending character. - CLI exit codes: 0 match, 1 mismatch, 2 bad input.
- JSON output shape.
--tankaand custom--pattern.--auto-breakon 。 and 、.
The core loop is in src/haiku_score/mora.py. Every character is classified into one of: full-size kana (+1 mora), small youon (+0, fused with previous), long-vowel mark (+1), sokuon (+1), hatsuon (+1), punctuation (+0), or "reject with position". Classification is done purely by Unicode codepoint ranges — the hiragana block U+3040..U+309F, the katakana block U+30A0..U+30FF, the long-vowel mark U+30FC, and small-kana characters by their specific codepoints.
if classifier.is_small_yoon(ch):
# きゃ is 1 mora, not 2. The previous full kana already contributed 1,
# and this small ゃ fuses into it for 0 additional moras.
previous_was_full_kana = False
continueThe scorer is a flat pass over the per-line counts, comparing each to the pattern, and the formatters render the result for humans (with colour and East-Asian width-aware alignment) or as JSON.
MIT. See LICENSE.
