IUG 2019: Ray Voelker

Quick Background

Sierra’s Direct SQL Access feature allows us to:
- Quickly and efficiently target and extract real-time data from the Sierra ILS.
- Organize data in logical and useful ways

Quick Background (cont.)

Why save / preserve data from Sierra?
1. Data transformations and integrations for specific use cases
  wikipedia.org/wiki/Data_transformation
  wikipedia.org/wiki/Data_integration
  - For example, adding geocoding information to patron address data:
    wikipedia.org/wiki/Geocoding

Quick Background (cont.)

Why save / preserve data from Sierra? (cont.)
1. Cache Sierra data:
  - For use in an application instead of running an “expensive query” to deliver content
  - For use in an application where holding onto data which may otherwise be destroyed or transformed by the Sierra application itself

Understanding Sierra’s database views and data

Some data in Sierra stays more static
(think of a “receipt”, or log of transactions)

Understanding Sierra’s database views and data (cont.)

Circulation transactions are created in the database table and remain static
Rows are deleted from the table after a certain period of time (2 weeks is the default, but this can be extended by iii upon request)


-- get some info about our circ_trans dates
SELECT
NOW()::TIMESTAMP WITH TIME ZONE as now_gmt,
MAX(c.transaction_gmt)::TIMESTAMP WITH TIME ZONE AS max,
MIN(c.transaction_gmt)::TIMESTAMP WITH TIME ZONE AS min,
AGE(MIN(c.transaction_gmt)) AS earliest_transaction_age

FROM
sierra_view.circ_trans as c

Understanding Sierra’s database views and data (cont.)

Other data is more variable or is a direct representation that describes a particular state of a record or process in the ILS.

Understanding Sierra’s database views and data (cont.)

The state of the hold is defined in the Sierra database
Data changes depending on the state or status of the hold, and is then removed from the database when the hold is deleted, filled or expires


-- this will select Ray Voelker's hold information from the 
-- Sierra SQL database
SELECT
h.id,
h.patron_record_id,
h.record_id,
h.status,
h.pickup_location_code

FROM
sierra_view.hold as h

WHERE
h.patron_record_id = 481038535591;

How to cache / transform / preserve data from Sierra?

No shortage of options!
- pgAdmin is a popular choice for a desktop client
  www.pgadmin.org
- "Execute query, write result to file"
  Creates a .csv file from the results

How to cache / transform / preserve data from Sierra? (cont.)

Many programming languages provide access to PostgreSQL via their libraries:
- php-pgsql: PHP PostgreSQL driver
  www.php.net/manual/en/book.pgsql.php
- psycopg2: Python PostgreSQL adapter
  initd.org/psycopg/docs/

How to cache / transform / preserve data from Sierra? (cont.)

My method consists of the following overview:
1. Use Python to connect to Sierra’s database
2. Issue SQL statement on Sierra's database to target and compile the data for extraction

How to cache / transform / preserve data from Sierra? (cont.)

My method consists of the following overview (cont.):
1. Export result data to either a .csv file, and/or directly into a SQLite database
  - .csv files are easy to later load into an SQLite database, spreadsheet, or other data warehouse tool

How to cache / transform / preserve data from Sierra? (cont.)

SQLite Database: sqlite.org

SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine.

How to cache / transform / preserve data from Sierra? (cont.)

SQLite Database (cont.)

SQLite database engine is a great tool for caching data:
1. Stores and organizes large amounts of data quickly and efficiently
2. You don’t have to set up and maintain a server (data is portable; entire database is contained in a single, cross-platform file)

How to cache / transform / preserve data from Sierra? (cont.)

SQLite Database (cont.)

Ability and flexibility to build SQL queries and applications that directly use the data that you’ve collected
Has a useful desktop tool: sqlitebrowser.org
It’s included in the Python Standard Library!


# Python sample code to create database, 
# create a table, and then insert some data
# note: sqlite3 is part of the Python Standard Library
import sqlite3

# create the database
conn = sqlite3.connect('example.db')
cursor = conn.cursor()

# create the table
cursor.execute("""CREATE TABLE IF NOT EXISTS `data` 
	( `id` INTEGER PRIMARY KEY AUTOINCREMENT,
	`input` TEXT )""")

# insert some data
cursor.execute("""INSERT INTO data (`input`) 
	VALUES ('sample text')""")

# commit and close the connection
conn.commit()
conn.close()

Examples:

Collection Analysis / Circulation Data Analysis
- intended for use with the CollectionHQ service, but can be adapted for local use
Patron Savings Calculator
- intended to display information on the Encore patron account information page about how much money has been "saved" by using the library

Examples: (cont.)

Hold Shelf Delivery Matrix Report
- intended to produce a spreadsheet report of items for hold shelf locations (ready for patron pickup) and from where they originated
Mapping Geo Data from Patron Address Data and Circulation Transaction Data
- intended to take geocoded patron address data and produce a visualization by plotting it on a map

Example 1

Collection Analysis / Circulation Data Analysis

github.com/plch/collection-analysis
This application was built for the purpose of automatically preserving and sending data from the Sierra database to the CollectionHQ service via FTP

Collection Analysis / Circulation Data Analysis (cont.)

Three main groups of data are targeted for export:
1. Bibliographic Record Data:
  temp_table-bib_data.sql
  Fields exported include: bib record number, control numbers (var fields tagged 'o'), isbn, call numbers, and a few other metadata fields related to the bib record

Collection Analysis / Circulation Data Analysis (cont.)

Three main groups of data are targeted for export: (cont.)
1. Item Record Data:
  temp_table-item_data.sql
  Fields exported include: item record number, bib record number, circulation information, price, and a few other metadata fields related to the item record

Collection Analysis / Circulation Data Analysis (cont.)

Three main groups of data are targeted for export: (cont.)
1. Hold Data:
  temp_table-hold_data.sql
  Weekly snapshot of bib-level holds. It is organized by title (bib record number), and then each title has a list of holds with related metadata for each hold (patron number, pickup location, created date, expiration date, etc)

Collection Analysis / Circulation Data Analysis (cont.)

Overview of the process:
1. Create and open .csv files to hold data output and/or open local database (if it’s useful to place data into local SQLite database)
2. Connect to remote Sierra database, and create the temporary tables that will be used for exporting

Collection Analysis / Circulation Data Analysis (cont.)

Overview of the process (cont.):
1. Generate data output from the Sierra database temporary tables, and write the output to a .csv file and/or to a local SQLite database
2. Send data via FTP
full export script can be found in the public github repository for this project

Example 2

Patron Savings Calculator

github.com/plch/patron-savings-calculator
This application was designed to work with the iii discovery layer Encore's account detail page, to display how much a patron has "saved" by borrowing from the library

Patron Savings Calculator (cont.)

Cached “savings” information is based on the patron record number and can be output in multiple formats (based on this application's custom URL endpoints):
- JSON
  /api/v1/patron_savings/2198439
```
{
	"count_titles": 118,
	"min_date_epoch": 1534957500,
	"patron_record_num": 2198439,
	"total_savings": 2109.2799999999997
}
									
```

Patron Savings Calculator (cont.)

Cached “savings” information is based on the patron record number and can be output in multiple formats (based on this application's custom URL endpoints):
- PNG
  /api/v1/patron_savings/img/2198439

Patron Savings Calculator (cont.)

About this very simple RESTful API
- Written as a Python / Flask application:
  - github.com/plch/patron-savings-calculator/blob/master/app.py
  - www.palletsprojects.com/p/flask

Patron Savings Calculator (cont.)

About this very simple RESTful API (cont)
- Hosted on an Apache Web Server via the WSGI module
- A good tutorial on how this can be set up can be found here:
  www.digitalocean.com/community/tutorials/how-to-deploy-a-flask-application-on-an-ubuntu-vps

Patron Savings Calculator (cont.)

About this very simple RESTful API (cont)
- Draws data from the SQLite database that has been caching relevant data
- SQLite database, `patron_savings.db` is updated frequently (every 5 minutes) via a Python update script:
  github.com/plch/patron-savings-calculator/blob/master/update.py

Patron Savings Calculator (cont.)

update.py

Script starts by looking at the last entry it received from the Sierra database
A query is constructed to extract relevant data from the Sierra database that is more recent than that last entry in the local database
The local database is updated with the fresh data from the Sierra database

Patron Savings Calculator (cont.)

No title information is saved in the local database, other than a hashed bib record id
Hashed bib record id is stored to avoid duplicating the price information when a title is checked out multiple times, and to differentiate titles from one another in the local database

Patron Savings Calculator (cont.)

No patron information, other than the patron record number, is stored in the local database
Only price and number of titles checked out are surfaced via the RESTful API

Example 3

Hold Shelf Delivery Matrix Report

github.com/plch/plch-holds-shelf
The purpose of this application is to create a spreadsheet that displays items delivered to a hold shelf location, and from what location they came

Hold Shelf Delivery Matrix Report (cont.)

SELECT FROM `sierra_view.hold` WHERE `status` =

code	definition
"b"	Bib hold ready for pickup
"j"	Volume hold ready for pickup
"i"	Item hold ready for pickup

Hold Shelf Delivery Matrix Report (cont.)

Source location:
- `checkin_statistics_group_code_num`
  found in `sierra_view.item_record` table view

Hold Shelf Delivery Matrix Report (cont.)

Destination location:
- `pickup_location_code`
  found in `sierra_view.hold`

Hold Shelf Delivery Matrix Report (cont.)

Note: the source value of `location_code` comes from
`sierra_view.statistic_group_myuser`
using `checkin_statistics_group_code_num` effectively giving us the pickup location from the stat group code num

Hold Shelf Delivery Matrix Report (cont.)

Do not insert duplicate rows of hold data...
- We could build a complicated set of comparisons of the remotely selected data to our local data ...
- Or, we could let the databases do all the work!
- Create a hash of the entire hold row, use that value as the unique primary key in the local database table

Hold Shelf Delivery Matrix Report (cont.)

Local SQLite table creation (simplified)


--  local SQLite table creation (simplified) ... 
CREATE TABLE IF NOT EXISTS "data" (
	`hash_row` TEXT UNIQUE PRIMARY KEY
	-- more columns created below ...
);

Hold Shelf Delivery Matrix Report (cont.)

Remote Sierra database query (simplified)


-- remote Sierra database query (simplified) ...
SELECT
MD5(CAST((h.*) AS TEXT)) AS hash_row
-- more columns of data selected below
FROM
sierra_view.hold AS h
WHERE
h.status IN(
	'b', 'j', 'i'
);

Hold Shelf Delivery Matrix Report (cont.)

Remote Sierra database query (simplified) output:

Hold Shelf Delivery Matrix Report (cont.)

Local SQLite table inserting the retrieved data (simplified):


INSERT OR IGNORE INTO data (
	hash_row
)
VALUES ('59752190f26bda0ec17167010cc6619f'), 
       ('59752190f26bda0ec17167010cc6619f'),
       ('ce971535718d416fd09e9266797c9374');

SELECT * FROM DATA;

Hold Shelf Delivery Matrix Report (cont.)

Overview of the process: (this should look familiar)
1. Connect to our local database, and create table if it doesn’t exist. Also, establish connection to the remote Sierra database
2. Query the Sierra ‘hold’ table for rows that have status of ‘i’, ‘j’, or ‘b’; this indicates that there is a item hold, volume hold, or bib hold ready for pickup

Hold Shelf Delivery Matrix Report (cont.)

Overview of the process: (cont.)
1. Insert retrieved rows (or ignore duplicate rows as explained previously) to the local SQLite database, then close all connections
2. Set the update of local data to happen frequently (every 5 minutes via CRON is a good method for doing this)

Hold Shelf Delivery Matrix Report (cont.)

Exporting / Producing the report
1. Query the local SQLite database, then export the results to a .csv file:
  github.com/plch/plch-holds-shelf/blob/master/export_csv.sh
  github.com/plch/plch-holds-shelf/blob/master/export.sql

Hold Shelf Delivery Matrix Report (cont.)

Exporting / Producing the report (cont.)
1. Import the .csv file into LibreOffice Calc (or Excel) and perform a pivot on the data:

Hold Shelf Delivery Matrix Report (cont.)

Exporting / Producing the report (cont.)

Hold Shelf Delivery Matrix Report (cont.)

Exporting / Producing the report (cont.)

Hold Shelf Delivery Matrix Report (cont.)

Exporting / Producing the report (cont.)

2019-04-15_hold_shelf_matrix.xlsx

Example 4

Mapping Geo Data from Patron Address Data and Circulation Transaction Data

The purpose of this process is to plot patron locations and branch locations on a map based on latitude / longitude coordinates derived from mailing address data
This is a work in progress!

Mapping Geo Data (cont.)

Cached data is contained in an SQLite database
- Circulation data
  (weekly export of `sierra_view.circ_trans` table view)
- Patron data
  (weekly export of relevant patron information from multiple table views)

Mapping Geo Data (cont.)

Geocoding Patron Street Addresses:
- Patron address data (patron_record_id, street number, street name, city, zip) are exported to a .csv file

Mapping Geo Data (cont.)

Possible Geocoding Services:
- Census.gov
  www.census.gov/data/developers/data-sets/Geocoding-services.html
- Google
  developers.google.com/maps/documentation/javascript/geocoding
- SmartyStreets
  smartystreets.com/products/list

Mapping Geo Data (cont.)

SmartyStreets has some very user-friendly services for bulk upload / download of address data for verification and geocoding
SmartyStreets may be able to offer a discounted / free service to libraries that allow for bulk verification / geocoding as well as on-the-fly verification and auto-corrected address inputs for things such as web input forms

Mapping Geo Data (cont.)

SmartyStreets list service returns .csv data back with relevant address information, identified by the unique ID (`patron_record_id`) that was provided
We may easily load this into the local SQLite database with the import csv feature (from the GUI)

Mapping Geo Data (cont.)

There are very good data analysis / visualization tools available for Python:
- PyViz
  pyviz.org
- A Conda metapackage "pyviz"
- Makes data visualization in Python easier to use

jupyter_notebook_example.pdf

Thank You!
Questions?