Collection Analysis
for $5 a Month. CHEAP

https://rayvoelker.github.io/iug2022/
https://collection-analysis.cincy.pl/

Ray Voelker
ray.voelker@gmail.com | ray.voelker@cincinnatilibrary.org

Talk Overview

  1. Collection Analysis Platform Overview (Datasette)
  2. Collection Snapshot Creation: Overview of the workflow and automation involved in extracting the collection metadata from the ILS (Sierra)
  3. The Analysis! (more Datasette + SQL + other tools)

https://datasette.io/

Collection Analysis Platform Overview

  • Datasette is described by its creator, Simon Willison, as "a tool for exploring and publishing data".
  • “Datasette is aimed at journalists, museum curators, archivists, local governments and anyone else who has data that they wish to share with the world.”

Simon's Datasette Presentations / Demos

Simon's Datasette Tutorials

Things I ❤️ about Datasette

  • Written in Python / Available via pip
  • Open source + supportive developer
  • Easy and intuitive to use
  • Well-documented
  • Useful, Large, and growing plugin library
  • Built-in API... Using SQL!
  • Flexible deployment with scale-to-zero-cost / free hosting options in mind
    ... or, CHEAP hosting options

Cincinnati & Hamilton County Public Library: Collection Analysis

https://collection-analysis.cincy.pl

Datasette Deployment / Hosting Options

  • There are lots of ways to deploy Datasette
    (Datasette even has a built in command datasette publish that can package and deploy to services like Heroku or Google Cloud)

Datasette Deployment / Hosting Options

https://docs.datasette.io/en/stable/deploying.html#deployment-fundamentals

  • Running Datasette on a full Linux server has a lot of advantages, but does have a steep learning curve
  • Luckily there are lots of help-pages, tutorials, and guides available!

DigitalOcean Droplet (Virtual Private Server)

  • CHEAP
    $5 per month for a 1 CPU / 1 GB memory / 25 GB Disk Linux Server (plenty for Datasette)
  • DigitalOcean has a great community forum with many great guides available for various System Administration tasks

Create DigitalOcean Droplet for Datasette Step by Step

https://ils-underground.github.io/python_datasette_vps.html

2. Collection Snapshot Creation

Extracting and Building the Data

Scripts For Extracting Data From ILS (Sierra)
https://github.com/cincinnatilibrary/collection-analysis

  • This was originally done to support the task of automating data extraction from the ILS for the CollectionHQ Service
  • We don't have to use CollectionHQ--Datasette can act as our data-analysis tool

3. The Analysis!

Looking at Data in Aggregate

  • How many items created between 6-12 months ago have not checked out
    • Facet by branch
Downloadable Audiobook, 
Downloadable Book, 
Downloadable Video, 
Downloadable Music, 
Web Document
					
						select
  item.location_code,
  l.location_name,
  l.branch_name,
  count(*) as count_items,
  'https://collection-analysis.cincy.pl/current_collection/'
  || 'item_view?location_code__exact='
  || item.location_code as items_link
from
  item
  join location_view as l on l.location_code = item.location_code
where branch_name = :branch_name
group by 1, 2, 3
					
run this query
  • Datasette Plugins:
    • datasette-vega -- A Datasette plugin that provides tools for generating charts using Vega
  • Datasette Plugins:
    • datasette-cluster-map -- detects tables with latitude and longitude columns and then plots them on a map using Leaflet.markercluster

Named parameters: Datasette provides a nice interface when a named parameter is placed within a SQL query

						
where
(
  select
  value
  from
  json_each(bib.indexed_subjects)
) like '%' || :subject_like || '%'
						
					

Canned Queries

  • Create something that looks more like a traditional REST API
						
# Datasette URI construction
scheme = "https://"
host = "collection-analysis.cincy.pl"
# path contains the name of the database
path = "/current_collection"
# this is either a table, view, or canned query  
canned_query_path = "/item_view"
response_format = '' # '' or '.csv' or '.json'
query_parameter = "?barcode__exact="
query_parameter_value = "A000036107985"

full_request_uri = scheme + host \
    + path + canned_query_path + response_format \
    + query_parameter + query_parameter_value
						
					

Using Datasette as a Data-source

Using Datasette as a Data-source

https://github.com/ ... /reports/colab_datasette_example.ipynb

Thanks, and PLEASE, let me know if you come up with any cool queries, ❤️ Datasette as much as I do, or want to use this at your library!

Ray Voelker
ray.voelker@gmail.com
ray.voelker@cincinnatilibrary.org