Find books at the right level

2026-06-18T14:28:59Z by Showboat 0.6.1

Find books at the right level

The payoff: browse a curated reading list by grade band — and trust what you see.

A parent or teacher asks the library a simple question: what can my kid actually read? The honest answer is scattered across scales that don’t line up — Lexile, Accelerated Reader (ATOS), Guided Reading / Fountas & Pinnell — and for many books it isn’t catalogued at all. The companion walkthrough “What grade is this book?” showed how we extract those measures honestly and where the catalog is silent. This one is the payoff: turning that scattered, multi-scale data into one grade band per book that you can actually browse — with the estimates clearly marked, never hidden.

The design decision that makes it work: AI-estimated bands are included in the browse filter, but rendered faint with an ⓘ that shows exactly how they were made. Including them keeps the browsable surface useful (it roughly doubles what you can filter); rendering them faint keeps it honest. A real catalogued grade is solid; anything derived or estimated is faint.

Every command below drives the real consumer code against a small frozen sample of real Newbery titles, their real catalog measures, and a real AI-estimate audit trail captured once from the lake. showboat verify re-runs each one and diffs the output.

1 · One band from many scales

The scales don’t convert to each other cleanly, so RL3 resolves each book’s measures per work into one coarse grade band, native-first: a real catalogued grade wins and shows solid; a scale we had to convert is kept but flagged as an estimate; genuine disagreement widens the band rather than inventing a midpoint.

consumers/reading_lists_datasette/.venv/bin/python docs/demos/browse-by-level/_driver.py consensus


One band from many scales
=========================
A book can carry several measures on scales that DON'T compare directly
(a Lexile is not an ATOS is not an F&P letter). RL3 resolves them per WORK into
one coarse grade band, native-first: a real catalogued grade (ATOS/MARC) wins and
is solid; a scale we had to CONVERT (e.g. Lexile→grade) is kept but flagged as an
estimate; genuine disagreement WIDENS the band, it never invents a midpoint.

  title                  band         basis                    shown
  ---------------------- ------------ ------------------------ ---------
  Holes                  Grades 3-5   native (catalogued)      solid
                         from: AR 4.6, 660L
  A Wrinkle in Time      Grades 3-5   converted (scale→grade)  faint
                         from: 740L
  A Single Shard         Grades 6-8   AI estimate              faint
                         from: (no catalogued measure)
  26 Fairmount Avenue    Grades K-2   AI estimate              faint
                         from: (no catalogued measure)

Holes carries TWO scales — ATOS 4.6 and Lexile 660L — that both land in grades 3–5;
the native ATOS wins, so its band is solid. A Wrinkle in Time has only a Lexile, so
its band is CONVERTED — kept, but marked an estimate. The two with no catalogued
measure at all fall through to the AI tier (next beat).

2 · The AI estimate, honestly

For the books with no catalogued measure at all, a local model — running on our own hardware, so the description never leaves the library network — estimates a band from the book’s description. The safeguard is self-consistency: three independent reads must agree within about one grade, or we abstain and show nothing. Every read is kept on a transparency view, so an estimate is never a black box.

consumers/reading_lists_datasette/.venv/bin/python docs/demos/browse-by-level/_driver.py estimate


The AI estimate, honestly
=========================
When a book has NO catalogued measure, a local model (qwen3:32b, on our own
hardware — the description never leaves the library network) estimates a grade band
from the book's description. But only when it agrees with itself: we take 3
independent reads; they must land within ~1 grade or we ABSTAIN and show nothing.
Every read is kept for inspection on the transparency view:

  'A Single Shard' — 3 independent reads (qwen3:32b):
    read 0: grades 5–8  (medium confidence)
            “Historical context, complex vocabulary (e.g., 'celadon ware,' 'irascible temper'), and thematic depth (perseverance, cultural craftsmanship) suggest middle-grade to young adult readability with nuanced character development and setting details.”
    read 1: grades 5–8  (medium confidence)
            “The description uses compound sentences, historical context, and specific vocabulary (e.g., 'celadon ware,' 'irascible temper') suggesting mid-to-upper elementary difficulty. Thematic depth and narrative complexity align with middle-grade readers.”
    read 2: grades 5–8  (medium confidence)
            “Historical setting, technical pottery terms, and emotional depth suggest mid-grade difficulty. Protagonist's age and accessible themes balance complexity.”

  agreement spread: 0.0 grades  →  all three agree exactly  →  PUBLISH
  → reading band: Grades 6-8, marked is_estimated=true (basis 'ai').
  Had the reads disagreed by more than ~1 grade, we would have published nothing —
  "don't know" is an honest answer, and a better one than a confident guess.

3 · How it renders — the honesty contract, in code

This is the heart of it, and it’s the real render plugin the public site runs. A native band is a solid coloured pill; a converted or AI band is the same colour but faint and dashed, with an ⓘ linking to “How AI estimates are made.” The colour grows up with the band. Here is the literal HTML the plugin emits for each sample book:

consumers/reading_lists_datasette/.venv/bin/python docs/demos/browse-by-level/_driver.py render


How it renders — the real plugin
================================
The browse surface runs these exact functions. A native band is a SOLID coloured
pill; a converted or AI band is the same colour but FAINT and dashed, with an ⓘ
that links to “How AI estimates are made”. The colour grows up with the band
(K-2 green → 3-5 blue → 6-8 indigo). Below is the literal HTML the plugin emits:

  Holes                  <span class="reading-band band--3-5">Grades 3-5</span>
  A Wrinkle in Time      <span class="reading-band band--3-5 band--estimated">Grades 3-5 <a class="band-info" href="/how-ai-estimates-are-made" title="Estimated — how?">ⓘ</a></span>
  A Single Shard         <span class="reading-band band--6-8 band--estimated">Grades 6-8 <a class="band-info" href="/how-ai-estimates-are-made" title="Estimated — how?">ⓘ</a></span>
  26 Fairmount Avenue    <span class="reading-band band--k-2 band--estimated">Grades K-2 <a class="band-info" href="/how-ai-estimates-are-made" title="Estimated — how?">ⓘ</a></span>

  Raw 'as catalogued' measures render as verbatim chips (Holes):
    <span class="chips"><span class="chip chip--catalogued meas--atos">AR 4.6</span><span class="chip chip--catalogued meas--lexile">660L</span></span>

  The colour class is keyed on the band; estimates add band--estimated:
    Grades K-2   -> band--k-2
    Grades 3-5   -> band--3-5
    Grades 6-8   -> band--6-8

Solid vs faint is the whole honesty contract: a parent can never mistake an
estimate for what the catalogue actually says — and the ⓘ shows exactly how it was made.

4 · Browse it — filter a curated list by level

On chimpy-reader the grade band is a real facet: pick Grades 3-5 and a curated list filters to those titles. Because the estimates are in the facet (marked), the browsable surface stays useful instead of being cut in half. A snapshot of the live distribution:

consumers/reading_lists_datasette/.venv/bin/python docs/demos/browse-by-level/_driver.py facet


Browse it — filter a curated list by level
==========================================
On chimpy-reader the grade band is a real Datasette facet: pick “Grades 3-5” and the
curated list filters to those titles. AI-estimated bands are INCLUDED in the facet
(marked faint), so the browsable surface stays useful instead of being halved.
This snapshot of the live surface — 99 of 177 curated titles carry a band:

  Grades K-2     5  █████
  Grades 3-5    67  ███████████████████████████████████████████████████████████████████
  Grades 6-8    27  ███████████████████████████

  99 banded titles browsable by level today — roughly half AI estimates
  (faint), the rest catalogued or converted (solid). Coverage grows on its own as
  the catalog backfill and the daily enrichment timers land more measures.

Proof

Every command above re-runs under showboat verify. The on-thesis invariants — native bands solid, converted and AI bands faint + ⓘ + inspectable, every band colour-coded, raw measures verbatim — are pinned by the demo’s own guard test, run against the real plugin:

PYTHONPATH=src consumers/reading_lists_datasette/.venv/bin/python -m pytest tests/demos/test_browse_by_level.py -q 2>&1 | sed -E 's/ in [0-9.]+s//'

.....                                                                    [100%]
5 passed

Where this goes next — open questions, not commitments

Browse-by-reading-level ships today. The next moves are designed or merely imagined, and posed here as questions, not claims:

A second axis: interest vs. reading level. A book can be easy to read but written for older kids (a “hi-lo” title), or the reverse. We already carry an interest band per work — but which direction of divergence is worth surfacing, and how, without confusing a parent? That’s a product question we haven’t answered yet.
From “what grade” to “for this reader.” A grade band is a property of the book. The richer question is the match — this reader, this interest, this level — which is a recommendation problem, and a policy one the moment it touches a real child’s history.
Coverage that compounds. Today about half the browsable bands are AI estimates. As the catalog backfill lands more real measures, those estimates get replaced by catalogued grades automatically — the surface gets more solid over time without anyone re-touching it.

The discipline is the point. We shipped the part we can stand behind — one honest, browsable band per book, estimates marked and inspectable — and everything above re-runs on demand.

← all walkthroughs · Rendered from 226199c on 2026-06-18 · showboat verify: reproduces. A living artifact — the version ledger is git.