Quick Background

  • Sierra’s Direct SQL Access feature allows us to:
    • Quickly and efficiently target and extract real-time data from the Sierra ILS.
    • Organize data in logical and useful ways

Quick Background (cont.)

Quick Background (cont.)

  • Why save / preserve data from Sierra? (cont.)
    1. Cache Sierra data:
      • For use in an application instead of running an “expensive query” to deliver content
      • For use in an application where holding onto data which may otherwise be destroyed or transformed by the Sierra application itself

Understanding Sierra’s database views and data

  • Some data in Sierra stays more static
    (think of a “receipt”, or log of transactions)

Understanding Sierra’s database views and data (cont.)

  • Circulation transactions are created in the database table and remain static
  • Rows are deleted from the table after a certain period of time (2 weeks is the default, but this can be extended by iii upon request)


-- get some info about our circ_trans dates
SELECT
NOW()::TIMESTAMP WITH TIME ZONE as now_gmt,
MAX(c.transaction_gmt)::TIMESTAMP WITH TIME ZONE AS max,
MIN(c.transaction_gmt)::TIMESTAMP WITH TIME ZONE AS min,
AGE(MIN(c.transaction_gmt)) AS earliest_transaction_age

FROM
sierra_view.circ_trans as c

Understanding Sierra’s database views and data (cont.)

  • Other data is more variable or is a direct representation that describes a particular state of a record or process in the ILS.

Understanding Sierra’s database views and data (cont.)

  • The state of the hold is defined in the Sierra database
  • Data changes depending on the state or status of the hold, and is then removed from the database when the hold is deleted, filled or expires

-- this will select Ray Voelker's hold information from the 
-- Sierra SQL database
SELECT
h.id,
h.patron_record_id,
h.record_id,
h.status,
h.pickup_location_code

FROM
sierra_view.hold as h

WHERE
h.patron_record_id = 481038535591;

How to cache / transform / preserve data from Sierra?

  • No shortage of options!
    • pgAdmin is a popular choice for a desktop client
      www.pgadmin.org
    • "Execute query, write result to file"
      Creates a .csv file from the results

How to cache / transform / preserve data from Sierra? (cont.)

How to cache / transform / preserve data from Sierra? (cont.)

  • My method consists of the following overview:
    1. Use Python to connect to Sierra’s database
    2. Issue SQL statement on Sierra's database to target and compile the data for extraction

How to cache / transform / preserve data from Sierra? (cont.)

  • My method consists of the following overview (cont.):
    1. Export result data to either a .csv file, and/or directly into a SQLite database
      • .csv files are easy to later load into an SQLite database, spreadsheet, or other data warehouse tool

How to cache / transform / preserve data from Sierra? (cont.)

SQLite Database: sqlite.org

SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine.

How to cache / transform / preserve data from Sierra? (cont.)

SQLite Database (cont.)

  • SQLite database engine is a great tool for caching data:
    1. Stores and organizes large amounts of data quickly and efficiently
    2. You don’t have to set up and maintain a server (data is portable; entire database is contained in a single, cross-platform file)

How to cache / transform / preserve data from Sierra? (cont.)

SQLite Database (cont.)

  1. Ability and flexibility to build SQL queries and applications that directly use the data that you’ve collected
  2. Has a useful desktop tool: sqlitebrowser.org
  3. It’s included in the Python Standard Library!

# Python sample code to create database, 
# create a table, and then insert some data
# note: sqlite3 is part of the Python Standard Library
import sqlite3

# create the database
conn = sqlite3.connect('example.db')
cursor = conn.cursor()

# create the table
cursor.execute("""CREATE TABLE IF NOT EXISTS `data` 
	( `id` INTEGER PRIMARY KEY AUTOINCREMENT,
	`input` TEXT )""")

# insert some data
cursor.execute("""INSERT INTO data (`input`) 
	VALUES ('sample text')""")

# commit and close the connection
conn.commit()
conn.close()
					

Examples:

  1. Collection Analysis / Circulation Data Analysis
    • intended for use with the CollectionHQ service, but can be adapted for local use
  2. Patron Savings Calculator
    • intended to display information on the Encore patron account information page about how much money has been "saved" by using the library

Examples: (cont.)

  1. Hold Shelf Delivery Matrix Report
    • intended to produce a spreadsheet report of items for hold shelf locations (ready for patron pickup) and from where they originated
  2. Mapping Geo Data from Patron Address Data and Circulation Transaction Data
    • intended to take geocoded patron address data and produce a visualization by plotting it on a map

Example 1

Collection Analysis / Circulation Data Analysis

  • github.com/plch/collection-analysis
  • This application was built for the purpose of automatically preserving and sending data from the Sierra database to the CollectionHQ service via FTP

Collection Analysis / Circulation Data Analysis (cont.)

  • Three main groups of data are targeted for export:
    1. Bibliographic Record Data:
      temp_table-bib_data.sql
      Fields exported include: bib record number, control numbers (var fields tagged 'o'), isbn, call numbers, and a few other metadata fields related to the bib record

Collection Analysis / Circulation Data Analysis (cont.)

  • Three main groups of data are targeted for export: (cont.)
    1. Item Record Data:
      temp_table-item_data.sql
      Fields exported include: item record number, bib record number, circulation information, price, and a few other metadata fields related to the item record

Collection Analysis / Circulation Data Analysis (cont.)

  • Three main groups of data are targeted for export: (cont.)
    1. Hold Data:
      temp_table-hold_data.sql
      Weekly snapshot of bib-level holds. It is organized by title (bib record number), and then each title has a list of holds with related metadata for each hold (patron number, pickup location, created date, expiration date, etc)

Collection Analysis / Circulation Data Analysis (cont.)

  • Overview of the process:
    1. Create and open .csv files to hold data output and/or open local database (if it’s useful to place data into local SQLite database)
    2. Connect to remote Sierra database, and create the temporary tables that will be used for exporting

Collection Analysis / Circulation Data Analysis (cont.)

  • Overview of the process (cont.):
    1. Generate data output from the Sierra database temporary tables, and write the output to a .csv file and/or to a local SQLite database
    2. Send data via FTP
  • full export script can be found in the public github repository for this project

Example 2

Patron Savings Calculator

  • github.com/plch/patron-savings-calculator
  • This application was designed to work with the iii discovery layer Encore's account detail page, to display how much a patron has "saved" by borrowing from the library

Patron Savings Calculator (cont.)

  • Cached “savings” information is based on the patron record number and can be output in multiple formats (based on this application's custom URL endpoints):
    • JSON
      /api/v1/patron_savings/2198439
      
      {
      	"count_titles": 118,
      	"min_date_epoch": 1534957500,
      	"patron_record_num": 2198439,
      	"total_savings": 2109.2799999999997
      }
      									

Patron Savings Calculator (cont.)

  • Cached “savings” information is based on the patron record number and can be output in multiple formats (based on this application's custom URL endpoints):

Patron Savings Calculator (cont.)

Patron Savings Calculator (cont.)

Patron Savings Calculator (cont.)

Patron Savings Calculator (cont.)

    About the `update.py` script :
    • Script starts by looking at the last entry it received from the Sierra database
    • A query is constructed to extract relevant data from the Sierra database that is more recent than that last entry in the local database
    • The local database is updated with the fresh data from the Sierra database

Patron Savings Calculator (cont.)

    A note about privacy / protection / obfuscation of this data:
    • No title information is saved in the local database, other than a hashed bib record id
    • Hashed bib record id is stored to avoid duplicating the price information when a title is checked out multiple times, and to differentiate titles from one another in the local database

Patron Savings Calculator (cont.)

    A note about obfuscation with this data (cont.):
    • No patron information, other than the patron record number, is stored in the local database
    • Only price and number of titles checked out are surfaced via the RESTful API

Example 3

Hold Shelf Delivery Matrix Report

  • github.com/plch/plch-holds-shelf
  • The purpose of this application is to create a spreadsheet that displays items delivered to a hold shelf location, and from what location they came

Hold Shelf Delivery Matrix Report (cont.)

Hold Shelf Delivery Matrix Report (cont.)

SELECT FROM `sierra_view.hold` WHERE `status` =

code definition
"b" Bib hold ready for pickup
"j" Volume hold ready for pickup
"i" Item hold ready for pickup

Hold Shelf Delivery Matrix Report (cont.)

  • Source location:
    • `checkin_statistics_group_code_num`
      found in `sierra_view.item_record` table view

Hold Shelf Delivery Matrix Report (cont.)

  • Destination location:
    • `pickup_location_code`
      found in `sierra_view.hold`

Hold Shelf Delivery Matrix Report (cont.)

  • Note: the source value of `location_code` comes from
    `sierra_view.statistic_group_myuser`
    using `checkin_statistics_group_code_num` effectively giving us the pickup location from the stat group code num

Hold Shelf Delivery Matrix Report (cont.)

  • Do not insert duplicate rows of hold data...
    • We could build a complicated set of comparisons of the remotely selected data to our local data ...
    • Or, we could let the databases do all the work!
    • Create a hash of the entire hold row, use that value as the unique primary key in the local database table

Hold Shelf Delivery Matrix Report (cont.)

Local SQLite table creation (simplified)


--  local SQLite table creation (simplified) ... 
CREATE TABLE IF NOT EXISTS "data" (
	`hash_row` TEXT UNIQUE PRIMARY KEY
	-- more columns created below ...
);
					

Hold Shelf Delivery Matrix Report (cont.)

Remote Sierra database query (simplified)


-- remote Sierra database query (simplified) ...
SELECT
MD5(CAST((h.*) AS TEXT)) AS hash_row
-- more columns of data selected below
FROM
sierra_view.hold AS h
WHERE
h.status IN(
	'b', 'j', 'i'
);
					

Hold Shelf Delivery Matrix Report (cont.)

Remote Sierra database query (simplified) output:

Hold Shelf Delivery Matrix Report (cont.)

Local SQLite table inserting the retrieved data (simplified):


INSERT OR IGNORE INTO data (
	hash_row
)
VALUES ('59752190f26bda0ec17167010cc6619f'), 
       ('59752190f26bda0ec17167010cc6619f'),
       ('ce971535718d416fd09e9266797c9374');

SELECT * FROM DATA;
					

Hold Shelf Delivery Matrix Report (cont.)

  • Overview of the process: (this should look familiar)
    1. Connect to our local database, and create table if it doesn’t exist. Also, establish connection to the remote Sierra database
    2. Query the Sierra ‘hold’ table for rows that have status of ‘i’, ‘j’, or ‘b’; this indicates that there is a item hold, volume hold, or bib hold ready for pickup

Hold Shelf Delivery Matrix Report (cont.)

  • Overview of the process: (cont.)
    1. Insert retrieved rows (or ignore duplicate rows as explained previously) to the local SQLite database, then close all connections
    2. Set the update of local data to happen frequently (every 5 minutes via CRON is a good method for doing this)

Hold Shelf Delivery Matrix Report (cont.)

Hold Shelf Delivery Matrix Report (cont.)

  • Exporting / Producing the report (cont.)
    1. Import the .csv file into LibreOffice Calc (or Excel) and perform a pivot on the data:

Hold Shelf Delivery Matrix Report (cont.)

  • Exporting / Producing the report (cont.)

Hold Shelf Delivery Matrix Report (cont.)

  • Exporting / Producing the report (cont.)

Hold Shelf Delivery Matrix Report (cont.)

  • Exporting / Producing the report (cont.)
2019-04-15_hold_shelf_matrix.xlsx

Example 4

Mapping Geo Data from Patron Address Data and Circulation Transaction Data

  • The purpose of this process is to plot patron locations and branch locations on a map based on latitude / longitude coordinates derived from mailing address data
  • This is a work in progress!
 
 

    Mapping Geo Data (cont.)

    • Cached data is contained in an SQLite database
      • Circulation data
        (weekly export of `sierra_view.circ_trans` table view)
      • Patron data
        (weekly export of relevant patron information from multiple table views)

Mapping Geo Data (cont.)

  • Geocoding Patron Street Addresses:
    • Patron address data (patron_record_id, street number, street name, city, zip) are exported to a .csv file

Mapping Geo Data (cont.)

Mapping Geo Data (cont.)

  • SmartyStreets has some very user-friendly services for bulk upload / download of address data for verification and geocoding
  • SmartyStreets may be able to offer a discounted / free service to libraries that allow for bulk verification / geocoding as well as on-the-fly verification and auto-corrected address inputs for things such as web input forms

Mapping Geo Data (cont.)

  • SmartyStreets list service returns .csv data back with relevant address information, identified by the unique ID (`patron_record_id`) that was provided
  • We may easily load this into the local SQLite database with the import csv feature (from the GUI)

Mapping Geo Data (cont.)

  • There are very good data analysis / visualization tools available for Python:
    • PyViz
      pyviz.org
    • A Conda metapackage "pyviz"
    • Makes data visualization in Python easier to use
jupyter_notebook_example.pdf
 

Thank You!
Questions?