Collection Analysis
for $5 a Month. CHEAP

https://rayvoelker.github.io/oh-iug-2021/collection_analysis_for_cheap

Ray Voelker
ray.voelker@gmail.com | ray.voelker@cincinnatilibrary.org

Talk Overview

  1. Collection Analysis Platform Overview
  2. Collection Snapshot Creation: Overview of the workflow and automation involved in extracting the collection metadata from the ILS (Sierra)
  3. The Analysis!

Collection Analysis Platform Overview

What is Datasette?

  • Datasette is described by its creator, Simon Willison, as "a tool for exploring and publishing data".
  • “Datasette is aimed at journalists, museum curators, archivists, local governments and anyone else who has data that they wish to share with the world.”

Simon's Datasette Presentations / Demos

Things I ❤️ about Datasette

  • Written in Python / Available via pip
  • Easy and intuitive to use
  • Well-documented
  • Useful, Large, and growing plugin library
  • Built-in API... Using SQL!
  • Open source + supportive developer
  • Flexible deployment with scale-to-zero-cost / free hosting options in mind
    ... or, CHEAP hosting options

Cincinnati & Hamilton County Public Library: Collection Analysis

https://ilsweb.cincinnatilibrary.org/collection-analysis/

Datasette Deployment / Hosting Options

  • There are lots of ways to deploy Datasette
    (Datasette even has a built in command datasette publish that can package and deploy to services like Heroku or Google Cloud)

Datasette Deployment / Hosting Options

https://docs.datasette.io/en/stable/deploying.html#deployment-fundamentals

  • Running Datasette on a full Linux server has a lot of advantages, but does have a steep learning curve
  • Luckily there are lots of help-pages, tutorials, and guides available!

DigitalOcean Droplet (Virtual Private Server)

  • CHEAP
    $5 per month for a 1 CPU / 1 GB memory / 25 GB Disk Linux Server (plenty for Datasette)
  • DigitalOcean has a great community forum with many great guides available for various System Administration tasks

Create DigitalOcean Droplet for Datasette Step by Step

https://ils-underground.github.io/python_datasette_vps.html

2. Collection Snapshot Creation

Extracting and Building the Data

Scripts For Extracting Data From ILS (Sierra)
https://github.com/cincinnatilibrary/collection-analysis

  • This was originally done to support the task of automating data extraction from the ILS for the CollectionHQ Service
  • We don't have to use CollectionHQ--Datasette can act as our data-analysis tool

3. The Analysis!

Looking at Data in Aggregate

  • What item locations are defined?
  • ... and how many items in each of those locations?
  • Datasette Plugin: Vega -- A Datasette plugin that provides tools for generating charts using Vega
  • Named parameters: Datasette provides a nice interface when a named parameter is placed within a SQL query

    						
    select
      bib.*
    from
      bib
    where
      indexed_subjects like '%' 
      || :subject 
      || '%'
    						
    					

    Canned Queries

    • Create something that looks more like a traditional REST API
    						
    # Datasette URI construction
    scheme = "https://"
    host = "ilsweb.cincinnatilibrary.org"
    # path contains the path of the Datasette instance,
    # and the name of the database
    path = "/collection-analysis/current_collection"
    canned_query_path = "/item_lookup_by_barcode"
    response_format = '' # '' or '.csv' or '.json'
    query_parameter = "?barcode="
    query_parameter_value = "A000036107985"
    
    full_request_uri = scheme + host \
        + path + canned_query_path + response_format \
        + query_parameter + query_parameter_value
    						
    					

    Thanks, and please, let me know if you come up with any cool queries, ❤️ Datasette as much as I do, or want to use this at your library!

    Ray Voelker
    ray.voelker@gmail.com
    ray.voelker@cincinnatilibrary.org