chimpy-lake showboats
Narrated, reproducible walkthroughs of working code from the chimpy-lake data platform. Each one shows real commands and their captured output; each is a living artifact — always the current render of current code.
Walkthroughs
a library data platform, pattern by pattern through the data-engineering canon
A capstone tour of the platform itself — every moving part mapped to an established data-engineering pattern and shown running over a small, no-PII fixture.
ELT schema-on-read data contracts medallion dbt DuckDB idempotency SCD Type 0 data quality observability control plane
matching curated reading lists against the CHPL catalog
Follow three real Newbery titles as a curated list is matched against the library catalog — confident, needs-review, or an honest gap (with the gaps named, not hidden).
reading lists catalog matching identifier normalization ISBN title-author key deterministic matching honest gaps Datasette
Index of keywords
catalog matching
Linking a list entry to the library's catalog (bib) record(s). → From a curated list to the books on our shelves
control plane
One place to see and operate every pipeline's status across the fleet. → chimpy-lake, by the book
data contracts
A declared, validated shape a data source must satisfy before it can join the lake. → chimpy-lake, by the book
data quality
Automated assertions on every load — row counts, nulls, uniqueness, schema drift. → chimpy-lake, by the book
Datasette
A tool that publishes a database as a browsable, queryable website. → From a curated list to the books on our shelves
dbt
A tool that builds SQL transformations as a tested, ordered dependency graph. → chimpy-lake, by the book
deterministic matching
Exact, rule-based matching — vs. fuzzy or probabilistic guessing. → From a curated list to the books on our shelves
DuckDB
An in-process analytical SQL database — like SQLite, but built for analytics. → chimpy-lake, by the book
ELT
Load the raw data first, transform it later — vs. ETL, which transforms before loading. → chimpy-lake, by the book
honest gaps
Naming the titles we could NOT confidently match, rather than hiding them. → From a curated list to the books on our shelves
idempotency
Running a load twice has the same effect as running it once — retries can't corrupt. → chimpy-lake, by the book
identifier normalization
Cleaning and cross-deriving IDs — e.g. an ISBN-10 implies its ISBN-13 twin. → From a curated list to the books on our shelves
ISBN
The standard book identifier; its 10- and 13-digit forms convert to each other. → From a curated list to the books on our shelves
medallion
Layered refinement: raw (bronze) → cleaned & typed (silver) → query-ready (gold). → chimpy-lake, by the book
observability
The platform records every run and computes its own health from that ledger. → chimpy-lake, by the book
reading lists
Curated lists of titles (e.g. award winners) checked against what the library holds. → From a curated list to the books on our shelves
SCD Type 0
A landed snapshot is never edited in place; history is append-only. → chimpy-lake, by the book
schema-on-read
Store data exactly as it arrives; impose structure only when you query it. → chimpy-lake, by the book
title-author key
A synthesized match key for records that arrive with no ISBN to match on. → From a curated list to the books on our shelves
Talks & related
IUG 2026 conference talk — early experiments at CHPL with library-owned data infrastructure.
A printable zine on owning the library's data — the lake, in plain language.
Generated 2026-06-09 from 2 published showboats · the version ledger is git, not a version number.