From a curated list to the books on our shelves

2026-06-09T02:37:48Z by Showboat 0.6.1

From a curated list to the books on our shelves

A reading group hands us a list — say, the Newbery Medal and Honor books. The question that matters is simple: which of these do we actually have, and where are the gaps? This walkthrough runs the real pipeline that answers it. It takes a curated list, lines each title up against the library catalog, and scores the match honestlyconfident, needs a human look, or an outright gap. We follow three real books the whole way through.

Public bibliographic data only · this reads frozen snapshots, never live systems · a discovery surface with no patron data.

1 · The curated list

Here is the Newbery list as the lake holds it — every Medal and Honor title — and the three books we’ll trace: one that matches cleanly, one that’s ambiguous, and one very famous book that, surprisingly, comes up empty.

uv run python docs/demos/_driver.py list

Beat 1 — the curated Newbery list, as landed (174 entries)
──────────────────────────────────────────────────────────
by award class:
  honor    70
  medal    104

the three titles we'll follow:
  Bud, Not Buddy                 Christopher Paul Curtis  (medal)
  Crispin: The Cross of Lead     Avi                      (medal)
  The Giver                      Lois Lowry               (medal)

2 · Giving every title something to match on

Before we can match a title, we normalize its identifiers. Watch the real registry run: it cleans and cross-derives ISBNs (a 10-digit ISBN implies its 13-digit twin) and — crucially — synthesizes a title + author key, so a book that arrives with no ISBN at all still has a handle to match on.

uv run python docs/demos/_driver.py normalize

Beat 2 — one title's identifiers, normalized by the real registry
─────────────────────────────────────────────────────────────────

The Giver  (Lois Lowry)
  harvested: 9 raw id(s) -> normalized: 10 (incl. derived + title_author)
    wikidata      Q258953                    -> —
    isbn13        978-0-395-64566-6          -> 9780395645666
    isbn10        0395645662                 -> 0395645662
    isbn13        978-0-440-90079-5          -> 9780440900795
    isbn10        0440900794                 -> 0440900794
    isbn13        978-1-4420-1496-1          -> 9781442014961
    isbn10        1442014962                 -> 1442014962
    isbn13        978-2-211-02166-1          -> 9782211021661
    isbn10        2211021662                 -> 2211021662
    title_author  The Giver|Lois Lowry       -> giver|lois lowry

Crispin: The Cross of Lead  (Avi)
  harvested: 1 raw id(s) -> normalized: 2 (incl. derived + title_author)
    wikidata      Q5186028                   -> —
    title_author  Crispin: The Cross of Lead|Avi -> crispin the cross of lead|avi

ISBN-10/13 cross-derive each other; a `title_author` key is synthesized so even
Crispin — which arrives with NO ISBN — still has something to match on.

3 · The verdict

Now the match. Each title gets one of three verdicts. Bud, Not Buddy matches on a shared ISBN — confident. Crispin arrived with no ISBN, so its title/author key turned up two possible catalog records — a review, flagged for a human rather than guessed. The Giver finds nothing — a gap.

uv run python docs/demos/_driver.py match

Beat 3 — the match verdict (one row per match; gaps have none)
──────────────────────────────────────────────────────────────
  title                        status    scheme        confidence bib
  ----------------------------------------------------------------------
  Bud, Not Buddy               confident isbn13        confident  1946139
  Crispin: The Cross of Lead   review    title_author  review     1819030
  Crispin: The Cross of Lead   review    title_author  review     1912048
  The Giver                    gap       —             —          —

Bud matched by ISBN (confident). Crispin had no ISBN, so the title/author key
found TWO candidate bibs — a review, not an auto-match. The Giver: no match at all.

4 · Honest about the gaps

This is the part we refuse to paper over. 142 of 174 Newbery titles come up as gaps — including The Giver, one of the most-held children’s books in any library. The reason isn’t that we don’t own it: the editions differ (our copies carry different ISBNs) and the catalog lists the author surname-first, which breaks the title/author bridge. Below is exactly why — and the two later sprints that close most of this gap. A useful tool tells you what it doesn’t know.

uv run python docs/demos/_driver.py gaps

Beat 4 — the honest gaps (142 of 174)
─────────────────────────────────────
status breakdown:
  confident 31
  review    1
  gap       142

The Giver is a GAP — yet it carries 8 normalized ISBNs across editions:
  9780395645666, 0395645662, 9780440900795, 0440900794, 9781442014961, 1442014962, 9782211021661, 2211021662
None match a bib in our catalog: we hold DIFFERENT editions (different ISBNs).
Its title/author key normalizes to:  'giver|lois lowry'
…but the catalog records the author as "Lowry, Lois" (surname-first) — a different
key — so even the title/author bridge misses. THAT is the gap.

Levers (later sprints): an author-order-insensitive title/author key (S3.3) and
work-clustering across editions (S3.4). We name the gap; we don't fake the match.

5 · What a reading group browses

All of this lands in a small, browsable Datasette. Three ready-made views answer the questions a reading group actually asks: what’s a gap, what do we confidently own, and what needs a human’s eye.

uv run python docs/demos/_driver.py browse

Beat 5 — what a reading group browses (the Datasette's canned queries)
──────────────────────────────────────────────────────────────────────

• newbery_gaps  — titles with no deterministic match (142 total; first 5):
    [medal] ...And Now Miguel  — Joseph Krumgold
    [honor] A Girl Named Disaster  — Nancy Farmer
    [medal] A Visit to William Blake's Inn  — Nancy Willard
    [medal] A Wrinkle in Time  — Madeleine L'Engle
    [medal] A Year Down Yonder  — Richard Peck

• owned (confident) — titles we hold by ISBN/OCLC (31 total; first 5):
    [honor] 26 Fairmount Avenue  — Tomie dePaola
    [medal] A Gathering of Days  — Joan Blos
    [medal] A Single Shard  — Linda Sue Park
    [honor] A Snake Falls to Earth  — Darcie Little Badger
    [medal] Bud, Not Buddy  — Christopher Paul Curtis

• review_queue — fuzzy candidates to triage by hand:
    Crispin: The Cross of Lead  ~  bib: Crispin : the cross of lead  (rank 1)
    Crispin: The Cross of Lead  ~  bib: Crispin the cross of lead  (rank 2)

Proof

Every command above is reproducible (showboat verify re-runs and diffs them), and the identifier registry that powers the matching ships green.

uv run pytest tests/lists tests/demos/test_reading_lists_demo.py -q 2>&1 | sed -E 's/ in [0-9.]+s//'
.........................................................                [100%]
57 passed

Where this could go

Today this is a discovery surface — no patron data, nothing private. But picture the next step, posed as open questions, not commitments:

None of that is built or designed. It would touch real patron data, so it stays behind the records-officer and counsel conversation — and it would ride on the same de-identification identity spine that lets us line a person’s activity up without ever storing who they are.

From a curated list to the books on our shelves — honestly scored, gaps and all.


Rendered from chimpy-lake 2ba1bd9 on 2026-06-09 · showboat verify: reproduces. A living artifact — the version ledger is git.