2026-06-17T04:44:38Z by Showboat 0.6.1
Reading levels, honestly — what our catalog actually says, and where it’s silent.
Parents and teachers ask a simple question of the library: what reading level is this book? Lexile, Accelerated Reader (ATOS), Guided Reading / Fountas & Pinnell, a grade band — there are many scales, and the answer is scattered, when it exists at all.
Before building anything, we measured our own
catalog. Of ~305,000 records we’d harvested, fewer than 1%
carried any reading-level field — and of the 521 “audience”
fields that do exist, about two-thirds are movie
ratings (“Rated PG-13”), not reading levels. So MARC alone is a
sparse, noisy source. That finding shaped the whole design:
RL1 is an honest extractor. It takes only what the
catalog actually says, stores every value verbatim,
throws out the movie ratings, and — crucially — never guesses a
number it can’t read. Estimates and a unified, browsable grade
band come later, clearly labelled as estimates.
This walkthrough runs the real parser and extractor
over a small frozen sample of genuine
526/521 field contents (bib ids are synthetic;
no patron data). Every command is reproducible —
showboat verify re-runs each one and diffs the output.
526 (Accelerated Reader)MARC field 526 is the standard home for
Accelerated Reader / Reading Counts data: $a the program,
$b an interest band (LG/UG/“6-8”), $c the
ATOS book level (a grade-equivalent number), and notes
in $z. When clean, it’s the richest signal we have — but
real records are inconsistent: subfields get mis-tagged, and a
guided-reading level can hide inside a $z note.
The parser is exacting: the ATOS $c
becomes a real number only when it’s plausibly in range (0–13);
a stray decimal mis-filed in the band $b is kept verbatim
as raw_only rather than invented into a grade; and a
“Guided reading level: R” buried in $z is recovered as its
own measure.
uv run --with duckdb python docs/demos/reading-levels/_driver.py parse526
526 — Accelerated Reader / Reading Counts: structured, but messy
================================================================
bib 9001 — Accelerated Reader — clean
measure_type value status raw_value
atos 5.6 parsed '5.6'
interest_grade — raw_only 'UG'
bib 9002 — Accelerated Reader — mis-tagged subfield
measure_type value status raw_value
atos 4.9 parsed '4.9'
interest_grade — raw_only '11.0.'
bib 9003 — Reading Counts — guided-reading level buried in $z
measure_type value status raw_value
atos 7.2 parsed '7.2'
interest_grade 6-8 parsed '6-8'
guided_reading_fp — parsed 'R'
The ATOS book level ($c) becomes a real number ONLY when it is in range (0–13);
a mis-tagged decimal in the interest band ($b) is kept verbatim as raw_only — never
guessed into a grade. A guided-reading level hiding in a $z note is recovered.
521, where most of it is movie ratingsMARC field 521 is “target audience” —
and at our library it’s dominated by AV/MPAA ratings
(“Rated PG-13”), which are not reading levels. The first
indicator tells them apart: ind1=8 is a
rating. The parser filters those out — counted, never
landed — and keeps the genuine signal: an interest-age note
like “Ages 5-9” is parsed into a range; free text it can’t pin to a
number is preserved verbatim, never fabricated.
uv run --with duckdb python docs/demos/reading-levels/_driver.py parse521
521 — target audience: mostly movie ratings, with reading signal mixed in
=========================================================================
bib 9004 — 521 with ind1=8 — a movie rating, NOT a reading level (FILTERED — counted, never landed)
(no measures)
bib 9005 — 521 interest-age note
measure_type value status raw_value
interest_age 5-9 parsed 'Ages 5-9.'
bib 9006 — 521 free-text the parser can't pin to a number
measure_type value status raw_value
audience_note — raw_only 'For mature audiences.'
A 521 with ind1=8 is an MPAA/AV rating ('Rated PG-13'), not a reading level — it is
filtered out and counted, never landed. Real reading signal ('Ages 5-9') is parsed;
free text the parser can't pin to a number is preserved verbatim, not invented.
RL1 is demand-scoped: it reads levels only for the
bibs the curated reading lists actually reference
(reading_lists.fct_match), not the whole 2-million-record
catalog. It lands raw measures into a new
enrichment.reading_levels table, and it’s
idempotent — a per-bib content hash with same-day
replace means re-running over unchanged MARC rewrites nothing
(“reverified”). Every landed row is is_estimated=False,
derivation='native', work_key=None: RL1
records only what the catalog says; the work-grained grade band and any
estimates come in later stages, flagged.
uv run --with duckdb python docs/demos/reading-levels/_driver.py extract
Demand-scoped extract — only the books our lists point at
=========================================================
run #1 — extract over the demand set (reading_lists_gold.fct_match):
demand 8
with_marc 7
with_measures 5
landed_bibs 5
measures 9
skipped_rating 1
reverified 0
parse_errors 0
demand bib 9099 has no landed MARC yet -> counted in demand, not with_marc.
bib 9004 was an AV rating only -> 0 measures (skipped_rating=1).
bib 9007 has no level fields -> examined, nothing to land.
run #2 — same MARC, same day:
landed_bibs 0
reverified 5
unchanged content hashes -> every bib 'reverified', nothing rewritten (idempotent).
landed rows in enrichment.reading_levels (9 total):
bib source measure_type estimated deriv raw_value
9001 marc526_ar atos False native '5.6'
9001 marc526_ar interest_grade False native 'UG'
9002 marc526_ar atos False native '4.9'
9002 marc526_ar interest_grade False native '11.0.'
9003 marc526_ar atos False native '7.2'
9003 marc526_ar guided_reading_fp False native 'R'
9003 marc526_ar interest_grade False native '6-8'
9005 marc521 interest_age False native 'Ages 5-9.'
9006 marc521 audience_note False native 'For mature audiences.'
Every landed row: is_estimated=False, derivation='native', work_key=None — RL1
stores only what the catalog actually says. Estimates and the work-grained
grade band come later (RL2–RL4), and will be flagged, not hidden.
The beats above run on a curated sample. So does it actually work on our books? We ran the extractor against the live lake and looked for titles our reading lists reference that carry a level in the catalog. The catalog backfill is still in flight — of 472 referenced titles, ~50 have MARC landed so far — but among the first real hits are two Newbery Medal winners we own, with their genuine catalog levels. (The records below were captured once from the live lake and frozen, so this beat reproduces.)
uv run --with duckdb python docs/demos/reading-levels/_driver.py realmatch
A real one — from a curated list to a reading level
===================================================
Newbery Medal (1963) · 'A wrinkle in time' by "L'Engle, Madeleine."
catalog bib 2653698
measure_type value status raw_value
reading_grade 5.8 parsed 'RL: 5.8.'
audience_note — raw_only '009-012.'
Newbery Medal (2013) · 'The one and only Ivan' by 'Applegate, Katherine.'
catalog bib 2516043
measure_type value status raw_value
interest_age 8-12 parsed 'Ages 8-12.'
These are REAL records from our catalog (captured once from the live lake, then
frozen so this reproduces). Of the 472 titles our reading lists reference, ~50 have
MARC landed so far (the catalog backfill is still running) — and these two award
winners are among the first real reading-level matches it surfaces. No guessing:
'RL: 5.8' becomes grade 5.8; the rest is preserved exactly as catalogued.
Every command above re-runs under showboat verify. The
on-thesis invariants — the AV-rating filter, “never guess a number,” and
the native-only idempotent extract — ship green:
uv run --with duckdb pytest tests/demos/test_reading_levels.py -q 2>&1 | sed -E 's/ in [0-9.]+s//'.... [100%]
4 passed
RL1 is the honest foundation. The rest of the program is designed, not yet built, and posed here as questions rather than claims:
The discipline is the point. RL1 ships the part we can stand behind completely — the catalog’s own words, verbatim, with the movie ratings thrown out and nothing guessed — and everything above re-runs on demand.
← all walkthroughs · Rendered from 226199c on 2026-06-18 · showboat verify: reproduces. A living artifact — the version ledger is git.