Ray Voelker
Cincinnati & Hamilton County Public Library
ray.voelker@chpl.org
Declarative YAML — no code changes needed
tables:
bibs:
fields:
title:
source: 245$a$b
author:
source: [100$a, 110$a, 111$a]
transform: first_available
pub_year:
source: 008[7:11]
type: int16
language:
source: 008[35:38]
Backup slide — for Q&A
MARC addressing grammar — point at the data you want:
| Syntax | Meaning | Example |
|---|---|---|
245$a$b |
Subfields, concatenated | Title + subtitle |
008[7:11] |
Control field slice | Publication year |
[100$a, 110$a, 111$a] |
First available source | Author (personal, corporate, or meeting) |
6XX$a |
Wildcard — all matching fields | All subject headings (650, 651, etc.) |
Backup slide — for Q&A
Transforms are plain Python functions registered by name.
Chainable: transform: [first_available, rstrip]
sierra_id — parse .b12345678x to numeric IDsierra_date — MM-DD-YY → ISO 8601sierra_datetime — with time componentprice_cents — $11.16 → 1116rstrip, strip, normalize_texts/pattern/replacement/ — inline regexfirst_available — pick first non-empty value from multiple sourcescollect_piped — gather repeating fields into Floods|Floods (1937)|Floods (1913)transforms.pyTRANSFORMS dict
Questions? Ideas? Other data sources you'd want in your lake?
This is early-stage work — feedback welcome!
Ray Voelker
ray.voelker@gmail.com |
ray.voelker@chpl.org
github.com/rayvoelker