chimpy-lake showboats

Narrated, reproducible walkthroughs of working code from the chimpy-lake data platform. Each one shows real commands and their captured output; each is a living artifact — always the current render of current code.

Walkthroughs

chimpy-lake, by the book published
a library data platform, pattern by pattern through the data-engineering canon
A capstone tour of the platform itself — every moving part mapped to an established data-engineering pattern and shown running over a small, no-PII fixture.
ELTschema-on-readdata contractsmedalliondbtDuckDBidempotencySCD Type 0data qualityobservabilitycontrol plane
From a curated list to the books on our shelves published
matching curated reading lists against the CHPL catalog
Follow three real Newbery titles as a curated list is matched against the library catalog — confident, needs-review, or an honest gap (with the gaps named, not hidden).
reading listscatalog matchingidentifier normalizationISBNtitle-author keydeterministic matchinghonest gapsDatasette

Index of keywords

catalog matching
Linking a list entry to the library's catalog (bib) record(s). From a curated list to the books on our shelves
control plane
One place to see and operate every pipeline's status across the fleet. chimpy-lake, by the book
data contracts
A declared, validated shape a data source must satisfy before it can join the lake. chimpy-lake, by the book
data quality
Automated assertions on every load — row counts, nulls, uniqueness, schema drift. chimpy-lake, by the book
Datasette
A tool that publishes a database as a browsable, queryable website. From a curated list to the books on our shelves
dbt
A tool that builds SQL transformations as a tested, ordered dependency graph. chimpy-lake, by the book
deterministic matching
Exact, rule-based matching — vs. fuzzy or probabilistic guessing. From a curated list to the books on our shelves
DuckDB
An in-process analytical SQL database — like SQLite, but built for analytics. chimpy-lake, by the book
ELT
Load the raw data first, transform it later — vs. ETL, which transforms before loading. chimpy-lake, by the book
honest gaps
Naming the titles we could NOT confidently match, rather than hiding them. From a curated list to the books on our shelves
idempotency
Running a load twice has the same effect as running it once — retries can't corrupt. chimpy-lake, by the book
identifier normalization
Cleaning and cross-deriving IDs — e.g. an ISBN-10 implies its ISBN-13 twin. From a curated list to the books on our shelves
ISBN
The standard book identifier; its 10- and 13-digit forms convert to each other. From a curated list to the books on our shelves
medallion
Layered refinement: raw (bronze) → cleaned & typed (silver) → query-ready (gold). chimpy-lake, by the book
observability
The platform records every run and computes its own health from that ledger. chimpy-lake, by the book
reading lists
Curated lists of titles (e.g. award winners) checked against what the library holds. From a curated list to the books on our shelves
SCD Type 0
A landed snapshot is never edited in place; history is append-only. chimpy-lake, by the book
schema-on-read
Store data exactly as it arrives; impose structure only when you query it. chimpy-lake, by the book
title-author key
A synthesized match key for records that arrive with no ISBN to match on. From a curated list to the books on our shelves

Talks & related

Building a Data Lake for Your Library talk
IUG 2026 conference talk — early experiments at CHPL with library-owned data infrastructure.
Our Data, Our Lake zine
A printable zine on owning the library's data — the lake, in plain language.