HOWTO SQL: SIERRA - PART 1

Ray Voelker
ray.voelker@cincinnatilibrary.org

Links for this presentation:

Meet Jeremy and Ray

...and Rufus and Audrey

rufus audrey

Overview of Relational Databases

  • Data in these types of databases are stored in collections, or tables
  • Tables have a number of rows and columns
  • Each row is (often) represented with a unique key.

Overview of Relational Databases: Keys

Keys (typically called ID’s in the Sierra Database) come in two varieties, and they define the relationship between tables.

  • Primary Key
  • Foreign Key

Overview of Database Entity-Relationship Model

(ERM View)

  • Defines the types of relationships that can exist between entities (tables)
    • One-to-One
    • One-to-Many (most common relationship)
    • Many-to-Many

Database Entity-Relationship Model

One-to-One

  • A Country can have one (and only one) Capital City
  • A Capital City can have one (and only one) Country

Database Entity-Relationship Model

One-to-One

Database Entity-Relationship Model

One-to-Many

  • A Mother may have many Children
  • A Child has only one (biological) Mother

Database Entity-Relationship Model

One-to-Many

Database Entity-Relationship Model

Many-to-Many

  • Authors can write several Books
  • Books can be written by several Authors

Database Entity-Relationship Model

Many-to-Many

Relational Databases: Relationships (JOINS)

  • Sets of data can be derived from Relational Operators from traditional math sets.
  • We'll cover two of the more common JOIN operations
    • JOIN (or INNER JOIN)
    • LEFT JOIN (or LEFT OUTER JOIN)

Relational Databases: Relationships (JOINS) cont.

JOIN (or INNER JOIN)

  • Most common type of join that there is
  • Given two sets, A (left) and B (right), performing this join will return a set containing all elements of A that also belong to B.

Relational Databases: Relationships (JOINS) cont.

LEFT JOIN (or LEFT OUTER JOIN)

  • Given two sets, A (left) and B (right) performing this join will return a set containing all elements of table A, as well as the elements of A that also belong to B

SQL Overview

  • Structured Query Language
  • Standardized language that allows a user to interface with a relational database

SQL Overview cont.

  • SQL statements are groups of clauses or other statements that define the operation on the database
  • Some of the more common statements include the following... (SQL is picky about the order in which these statements appear in so they're presented in the order that they can appear in the statement.)

SQL Overview cont.

  • SELECT
    • Retrieves data from tables
    • Most commonly used statement
  • UPDATE and SET
    • Modifies a set of existing table rows
  • DELETE
    • Remove set of existing rows from the table

SQL Overview cont.

  • CREATE (TEMPORARY TABLE)
    • Typically used to create a table in the database
    • Can be used to create TEMPORARY table for use in subsequent queries
      (exists for the duration of a database session, and is dropped when the connection or session ends)

SQL Overview cont.

  • JOIN / LEFT JOIN / etc
    • Performing a join will combine the data with that of another table
  • FROM
    • Indicates which table to retrieve data from

SQL Overview cont.

  • WHERE / WHERE IN
    • Include or exclude data based on comparisons
  • GROUP BY
    • Reduces sets into common values
  • HAVING
    • Allows for filtering of the GROUP BY statement

SQL Overview cont.

  • ORDER BY
    • Sort the set
      in ascending order (ASC)
      or descending order (DESC)
  • LIMIT / OFFSET
    • Returns specific numbers of rows from given starting point
    • Usually used in conjunction with the ORDER BY clause to ensure the set is always presented in the same order

SQL SELECT Statements

  • FINALLY SOME EXAMPLES!
  • Let us say we wanted a list ...
    • Bib Record info
    • 10 Oldest (earliest dates created)
  • The following query would give us a very basic view of those records:
SELECT
r.id, r.record_type_code, 
r.record_num, r.creation_date_gmt, 
r.deletion_date_gmt, r.num_revisions

FROM
sierra_view.record_metadata AS r

WHERE
r.record_type_code = 'b'

ORDER BY
r.creation_date_gmt ASC
					

Filter by specific bib numbers ...

SELECT
r.id, r.record_type_code, 
r.record_num, r.creation_date_gmt, 
r.deletion_date_gmt, r.num_revisions

FROM
record_metadata AS r

WHERE
r.record_type_code = 'b'
AND r.record_num IN ( 
1000001, 1000032, 1000041, 1000045, 1000056, 
1000139, 1000303, 1000345, 1000410, 1000426
)

ORDER BY
r.creation_date_gmt ASC
						

SQL SELECT Statement cont.

  • This is ok, but maybe we also wanted "Title" from the bib record
  • We'll use a JOIN!
  • Looks like table "bib_record_property" has what we need

JOIN table bib_record_property ...


SELECT
r.id, r.record_type_code, 
r.record_num, r.creation_date_gmt, 
r.deletion_date_gmt, r.num_revisions,
p.best_title

FROM
record_metadata AS r

JOIN bib_record_property AS p 
  ON p.bib_record_id = r.id

WHERE
r.record_type_code = 'b'
AND r.record_num IN ( 
  1000001, 1000032, 1000041, 1000045, 1000056, 
  1000139, 1000303, 1000345, 1000410, 1000426
)

ORDER BY
r.creation_date_gmt ASC
					

SQL SELECT Statement cont.

  • Table, "bib_record_property", has no foreign key for the deleted record and therefore won’t be joined in our results
  • We can fix this!
  • ... with a LEFT JOIN (or LEFT OUTER JOIN)!

JOIN LEFT OUTER JOIN table bib_record_property ...


SELECT
r.id, r.record_type_code, 
r.record_num, r.creation_date_gmt, 
r.deletion_date_gmt, r.num_revisions,
p.best_title

FROM
record_metadata AS r

--         JOIN bib_record_property as p
LEFT OUTER JOIN bib_record_property as p
  ON p.bib_record_id = r.id

WHERE
r.record_type_code = 'b'
AND r.record_num IN ( 
	1000001, 1000032, 1000041, 1000045, 1000056, 
	1000139, 1000303, 1000345, 1000410, 1000426
)

ORDER BY
r.creation_date_gmt ASC

SQL GROUP BY / HAVING and Aggregate Functions

  • GROUP BY
    • Reduces sets into common values, or groups results of one or more column together
    • Often used along with aggregate functions
      • COUNT()
      • SUM()
      • MAX()

SQL GROUP BY / HAVING and Aggregate Functions cont.

  • How many Titles are there in our database that have
    • publish_year of 1977?

SQL GROUP BY / HAVING and Aggregate Functions cont.


SELECT
p.publish_year,
count(*) as count

FROM
bib_record_property as p

WHERE
p.publish_year = 1977

GROUP BY
p.publish_year
				

SQL GROUP BY / HAVING and Aggregate Functions cont.

  • When grouped by publish_year How many publish_year values are there having a count of exactly 200 titles?

SQL GROUP BY / HAVING and Aggregate Functions cont.


SELECT
p.publish_year,
count(*) as count

FROM
bib_record_property as p

GROUP BY
p.publish_year

HAVING
count(*) = 200
				

SQL GROUP BY / HAVING and Aggregate Functions cont.

  • What are the top 5 most popular titles (by item checkout_total + renewal_total) for items currently in the location "Blue Ash Documentaries"?

SQL GROUP BY / HAVING and Aggregate Functions cont.


SELECT
p.best_title,
r.record_num AS bib_record_num, 
sum( i.checkout_total + i.renewal_total) AS total_circ
	
FROM
item_record AS i

JOIN
bib_record_item_record_link as l
  ON l.item_record_id = i.record_id

JOIN
bib_record_property AS p
  ON p.bib_record_id = l.bib_record_id
	
JOIN
record_metadata AS r
  ON r.id = l.bib_record_id

WHERE
-- baadc : 'Blue Ash Documentaries'
i.location_code = 'baadc'

GROUP BY
best_title,
bib_record_num

ORDER BY
total_circ DESC

LIMIT 5
				

Subquery

  • Subqueries are simply nested queries
  • Can occur in the SELECT, FROM, or WHERE clauses
  • Useful to use to use as a filter in the WHERE clause, or limiting to a single value in the SELECT clause (unexpected one-to-many situations)

Subquery cont.

  • Are there multiple barcodes associated with single item records?
    • Start by getting a set of item record ID values where the item_record_id appears multiple times when joining to the barcode varfield value

Subquery cont.


SELECT
ir.record_id

FROM
item_record as ir

JOIN varfield AS vf 
  ON vf.record_id = ir.record_id
  AND vf.varfield_type_code = 'b'

GROUP BY
ir.record_id

HAVING
count(*) > 1
				

Subquery cont.


SELECT
i.record_id as item_record_id,
r.record_num as item_record_num,
i.location_code,
v.field_content,
v.occ_num

FROM
item_record as i

JOIN
record_metadata as r
  ON r.id = i.record_id
	
JOIN varfield AS v 
  ON v.record_id = i.record_id
  AND v.varfield_type_code = 'b'

WHERE
i.record_id IN (
  SELECT
  ir.record_id
  FROM
  item_record as ir
  JOIN varfield AS vf 
  ON vf.record_id = ir.record_id
  AND vf.varfield_type_code = 'b'
  GROUP BY
  ir.record_id
  HAVING
  count(*) > 1
)

ORDER BY
i.record_id,
v.occ_num
				

Subquery cont.

			
SELECT
i.record_id,
i.item_status_code,
i.location_code,
v.field_content
	
FROM
item_record AS i

JOIN
varfield AS v 
  ON v.record_id = i.record_id
  AND v.varfield_type_code = 'b'

-- say we were targeting these two item records.
-- we may only expect two rows in our results ...
WHERE
i.record_id IN (
	450977833432, 450978609205
)
				

Subquery cont.

	
SELECT
i.record_id,
i.item_status_code,
i.location_code,
( 
	SELECT
	v.field_content
	FROM
	varfield as v
	WHERE
	v.record_id = i.record_id
	AND v.varfield_type_code = 'b'
	ORDER BY
	v.occ_num
	LIMIT 1
) AS barcode
	
FROM
item_record AS i

WHERE
i.record_id IN (
	450977833432, 450978609205
)
				

CASE Statement

  • Returns a value when a condition is met
  • Helpful for
    • formatting one value into another value
    • creating values to aggregate on

CASE Statement cont.

  • Produce a list of item record numbers and give a nicer name for the 'item_status_code' value

CASE Statement cont.


SELECT
r.record_num AS item_record_num, i.item_status_code,
CASE
WHEN item_status_code = '-' THEN 'AVAILABLE'
WHEN item_status_code = 'o' THEN 'LIB USE ONLY'
ELSE item_status_code
END as our_item_status

FROM
item_record as i

JOIN
record_metadata as r
  ON r.id = i.record_id

ORDER BY
r.creation_date_gmt ASC

LIMIT 10
				

CASE Statement cont.

  • From the location "Main Children's Library" (1cj), produce a count of items based on their last circulation dates
    • '1-recent checkouts' = 2020-01-01 to Present
    • '2-kinda-recent checkouts' = 2018-01-01 to 2020-01-01
    • '3-long-ago checkouts' = OLDER THAN '2018-01-01'
    • '4-NO CHECKOUTS!' = no last checkout value

CASE Statement cont.


SELECT

r.record_num as item_record_num,
i.last_checkout_gmt,
CASE
WHEN i.last_checkout_gmt >= '2020-01-01' THEN '1-recent checkouts'
WHEN i.last_checkout_gmt >= '2018-01-01' THEN '2-kinda-recent checkouts'
WHEN i.last_checkout_gmt <  '2018-01-01' THEN '3-long-ago checkouts'
WHEN i.last_checkout_gmt is null THEN '4-NO CHECKOUTS!'
END as 'last_checkout_groups'

FROM
item_record as i

JOIN
record_metadata as r
ON
  r.id = i.record_id
	
WHERE
i.location_code = '1cj'

LIMIT 10

				

CASE Statement cont.


SELECT
CASE
WHEN i.last_checkout_gmt >= '2020-01-01' THEN '1-recent checkouts'
WHEN i.last_checkout_gmt >= '2018-01-01' THEN '2-kinda-recent checkouts'
WHEN i.last_checkout_gmt <  '2018-01-01' THEN '3-long-ago checkouts'
WHEN i.last_checkout_gmt is null THEN '4-NO CHECKOUTS!'
END as 'last_checkout_groups',
count(*) as count_items

FROM
item_record as i
	
WHERE
i.location_code = '1cj'

GROUP BY
last_checkout_groups

ORDER BY
last_checkout_groups
				

THANK YOU!

Other Useful Resources