haiku-score

Validate Japanese haiku and tanka against a mora pattern. Counts moras (拍), not syllables or characters — the unit haiku actually uses. Kana-only by design. Zero runtime dependencies.

git clone https://github.com/sen-ltd/haiku-score
cd haiku-score
docker build -t haiku-score .
printf 'ふるいけや\nかわずとびこむ\nみずのおと\n' | docker run --rm -i haiku-score -

Why

The standard English explanation of haiku is "5-7-5 syllables", and it is wrong in a way that makes every beginner write bad haiku. Japanese verse doesn't count syllables; it counts moras, a unit shorter than an English syllable. The difference is load-bearing:

きゃ is 1 mora, not 2. The small ゃ fuses into the previous kana (this is called youon, 拗音).
とうきょう is 4 moras — と・う・きょ・う — even though English speakers often call it 2 syllables.
ほん is 2 moras: the ん (hatsuon, 撥音) is its own beat.
にっぽん is 4 moras: the small っ (sokuon, 促音) is a beat of silence.
コーヒー is 4 moras: the 長音符ー extends the previous vowel by one full beat.

haiku-score implements these rules and validates a pattern (default 5-7-5, or 5-7-5-7-7 with --tanka, or anything you like with --pattern). The core algorithm is about 50 lines of Python stdlib; the whole project is a crash course in basic Japanese phonology wearing a CLI costume.

Quickstart

# A valid haiku (Basho's 古池, in kana form).
printf 'ふるいけや\nかわずとびこむ\nみずのおと\n' | docker run --rm -i haiku-score -

# Tanka (5-7-5-7-7).
docker run --rm -i haiku-score - --tanka < poem.txt

# A custom pattern.
docker run --rm -i haiku-score - --pattern 3,5,3 < micro.txt

# Machine-readable output.
docker run --rm -i haiku-score - --format json < poem.txt

# A one-liner with Japanese punctuation as line breaks.
docker run --rm haiku-score 'ふるいけや、かわずとびこむ。みずのおと' --auto-break

The kana-only scope choice

haiku-score refuses kanji input:

$ printf '古池や\n' | haiku-score -
haiku-score: kanji '古' at position 0: haiku-score counts moras on kana input.
Convert kanji to kana first (e.g. with pykakasi, or your own lookup) then re-run.

This is deliberate. Kanji-to-kana is a hard problem: 日 is ひ or にち or じつ depending on context, and doing it right requires either a full morphological analyser (MeCab, Sudachi) or a large pronunciation dictionary. That's a whole different project, and once you're pulling in fugashi and a dictionary you've blown the "zero dependencies, ~50 lines of core logic" thesis. So the tool does one thing — count moras on kana input — and tells you how to convert kanji when it sees them.

If you need automatic kanji handling, pipe your text through pykakasi first:

echo "古池や" | python -c "
import sys, pykakasi
kks = pykakasi.kakasi()
for line in sys.stdin:
    print(''.join(item['hira'] for item in kks.convert(line.strip())))
" | haiku-score -

Exit codes

Code	Meaning
0	Input matches the pattern.
1	Input parsed, but the mora counts don't match.
2	Bad input: kanji, empty text, wrong number of lines, bad flags.

Running without Docker

pip install .
haiku-score poem.txt
haiku-score - < poem.txt
haiku-score poem.txt --tanka --format json

Requires Python 3.10+. No runtime dependencies — just argparse, json, and unicodedata from the standard library.

Tests

docker run --rm --entrypoint pytest haiku-score -q

Extensive coverage of the mora rules specifically, including:

The five vowels (あいうえお = 5).
Youon fusion (きゃ = 1, きゅうり = 3, しゃしん = 3).
Sokuon (にっぽん = 4).
Hatsuon (ほん = 2, ラーメン = 4).
Long-vowel mark (コーヒー = 4).
Punctuation is ignored (あ、い。う = 3).
Katakana parity (カタカナ = 4, ジャム = 2).
Basho's 古池 haiku in kana form: 5-7-5 ✓.
Kanji input raises MoraError with the position of the offending character.
CLI exit codes: 0 match, 1 mismatch, 2 bad input.
JSON output shape.
--tanka and custom --pattern.
--auto-break on 。 and 、.

How it works

The core loop is in src/haiku_score/mora.py. Every character is classified into one of: full-size kana (+1 mora), small youon (+0, fused with previous), long-vowel mark (+1), sokuon (+1), hatsuon (+1), punctuation (+0), or "reject with position". Classification is done purely by Unicode codepoint ranges — the hiragana block U+3040..U+309F, the katakana block U+30A0..U+30FF, the long-vowel mark U+30FC, and small-kana characters by their specific codepoints.

if classifier.is_small_yoon(ch):
    # きゃ is 1 mora, not 2. The previous full kana already contributed 1,
    # and this small ゃ fuses into it for 0 additional moras.
    previous_was_full_kana = False
    continue

The scorer is a flat pass over the per-line counts, comparing each to the pattern, and the formatters render the result for humans (with colour and East-Asian width-aware alignment) or as JSON.

License

MIT. See LICENSE.

Links

📝 dev.to: https://dev.to/sendotltd/counting-moras-why-your-5-7-5-haiku-validator-is-probably-wrong-29im

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
src/haiku_score		src/haiku_score
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

haiku-score

Why

Quickstart

The kana-only scope choice

Exit codes

Running without Docker

Tests

How it works

License

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

haiku-score

Why

Quickstart

The kana-only scope choice

Exit codes

Running without Docker

Tests

How it works

License

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages