PLCH

2020-01-09 CincyPy | Ray Voelker

ray.voelker@gmail.com

https://rayvoelker.github.io/

The Python Standard Library >> Data Persistence >> SQLite3


Documentation

https://docs.python.org/3/library/sqlite3.html

SQLite is a C library that provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language. Some applications can use SQLite for internal data storage. It’s also possible to prototype an application using SQLite and then port the code to a larger database such as PostgreSQL or Oracle.

...

To use the module, you must first create a Connection object that represents the database. Here the data will be stored in the example.db file:

import sqlite3
conn = sqlite3.connect('example.db')

You can also supply the special name :memory: to create a database in RAM.

conn = sqlite3.connect(":memory:")

# or, to enable shared cache, connect to the database like this...
conn = sqlite3.connect("file::memory:?cache=shared")
# https://sqlite.org/sharedcache.html
c = conn.cursor()

# Create table
c.execute('''CREATE TABLE stocks
             (date text, trans text, symbol text, qty real, price real)''')

# Insert a row of data
c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")

# Save (commit) the changes
conn.commit()

# We can also close the connection if we are done with it.
# Just be sure any changes have been committed or they will be lost.
conn.close()

Usually your SQL operations will need to use values from Python variables. You shouldn’t assemble your query using Python’s string operations because doing so is insecure; it makes your program vulnerable to an SQL injection attack (see https://xkcd.com/327/ for humorous example of what can go wrong).

Instead, use the DB-API’s parameter substitution. Put ? as a placeholder wherever you want to use a value, and then provide a tuple of values as the second argument to the cursor’s execute() method. (Other database modules may use a different placeholder, such as %s or :1.) For example:

# Never do this -- insecure!
symbol = 'RHAT'
c.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)

# Do this instead
t = ('RHAT',)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
print(c.fetchone())

# Larger example that inserts many records at a time
purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
             ('2006-04-05', 'BUY', 'MSFT', 1000, 72.00),
             ('2006-04-06', 'SELL', 'IBM', 500, 53.00),
            ]
c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)

Example 1:

Log file analysis:

  • Managing a service called EZproxy

    ...web proxy server used by libraries to give access from outside the library's computer network to restricted-access websites that authenticate users by IP address. This allows library patrons at home or elsewhere to log in through their library's EZproxy server and gain access to resources to which their library subscribes, such as bibliographic databases. wikipedia)

  • EZproxy produces apache-style logs... I often have to find out if there's an maybe a user who is "abusing" the service... aggregate useage by user, over a given period of time... perhaps even by resource

Objective: Convert the (apache style) ezproxy log file into .csv files for easier analysis. Current EZproxy config for log output: (https://help.oclc.org/Library_Management/EZproxy/Configure_resources/LogFormat)

LogFormat %h %{ezproxy-session}i %u %t "%r" %s %b

EZproxy Log Directive Description
%h The IP address of the Host accessing EZproxy
%{ezproxy-session}i EZproxy identifier for the user's current session
%u Username used to log into EZproxy if Option LogUser appears in config.txt; session identifier if Option LogSession appears in config.txt; - otherwise. To log both username and session, add only Option LogUser to config.txt, then use %u for the username and %{ezproxy-session}i for the session identifier.
%t Date/time of request; may also appear as %{format}t to specify a strftime time format.
%r Complete request (e.g. GET http://www.somedb.com HTTP/1.0)
%s HTTP numeric status of request
%b Number of bytes transferred

Example line:

66.42.221.177 xiPT7OYWGCpRcAS 12345678 [20/May/2019:00:08:24 -0400] "GET https://research.cincinnatilibrary.org:443 HTTP/1.1" 302 0
Description Value
IP addresss "66.42.221.177"
Users session id "xiPT7OYWGCpRcAS"
Username used to log into EZproxy "12345678"
Date/time of request "2019-05-20T00:08:24-04:00"
Complete request "GET https://research.cincinnatilibrary.org:443 HTTP/1.1"
HTTP numeric status 302
Number of bytes transferred 0
In [1]:
import re
import sqlite3
import csv
from dateutil import parser

# data file to analyze 
data_file = 'ezproxy.log.20190805.000001'

ezproxy_regex = re.compile(
	r"^([^\s]{1,})\s{1}"		# first capture group:	%h IP address
	r"([^\s]{1,})\s{1}"			# second capture group:	%{ezproxy-session}i users current session
	r"([^\s]{1,})\s{1}"			# third capture group:	%u 	Username used to log into EZproxy
	r"(?:\[)(.*?)(?:\])\s{1}"	# fourth capture group:	%t 	Date/time of request
	r"(?:\")(.*?)(?:\")\s{1}"	# fifth capture group:	%r 	Complete request
	r"([^\s]{1,})\s{1}"			# sixth capture group:	%s 	HTTP numeric status
	r"([^\s]{1,})"				# seventh capture group:%b 	Number of bytes transferred
)

# db = sqlite3.connect('ezproxy.db')
db = sqlite3.connect("file::memory:?cache=shared")

sql = """
 -- note: SQLite does not have a storage class set aside for storing dates and/or times.
 -- https://www.sqlite.org/datatype3.html
CREATE TABLE IF NOT EXISTS
`data` ( 
    `ip` TEXT,
    `ezproxy_session` TEXT,
    `patron_username` TEXT,
    `iso_timestamp` TEXT,
    `completed_request` TEXT,
    `http_status` INTEGER,
    `num_bytes` INTEGER
);

CREATE INDEX IF NOT EXISTS
`idx_patron_search` ON `data`(
	`ezproxy_session`,
	`patron_username`,
	`iso_timestamp`
);

CREATE TABLE IF NOT EXISTS
`patron_nicknames` (
    `patron_username` TEXT,
    `nickname` TEXT
);

CREATE INDEX IF NOT EXISTS
`idx_patron_nickname` ON `patron_nicknames`(
	`patron_username`
);

"""

db.executescript(sql)

# insert sql command 
sql = """
INSERT INTO data(
    ip,
    ezproxy_session,
    patron_username,
    iso_timestamp,
    completed_request,
    http_status,
    num_bytes
)
VALUES(?, ?, ?, ?, ?, ?, ?)
"""

cursor = db.cursor()

# text file contains the exproxy-formatted log data as defined above
count = 0
with open(data_file, 'rt') as data_input:
    for line in data_input:
        # add a counter and output per 10,000 lines processed
        count += 1
        if (count % 10000 == 1):
            print('.', end='', flush=True)
        x = ezproxy_regex.search(line)
        if (x != None):
            x_groups = list(x.groups())
            # convert the time string to ISO format (replace the first ':' with a ' ')
            x_groups[3] = parser.parse(x_groups[3].replace(':', ' ', 1)).isoformat()
            cursor.execute(sql, x_groups)

print("\nProcessed lines: {}".format(count))
            
cursor.close()
db.commit()

# if we want to store the logfile in the converted .csv then uncomment the following:
#
# # csv file for output
# 	with open('temp.csv', 'w') as csvfile:
# 		csv_writer = csv.writer(csvfile, quoting=csv.QUOTE_ALL, dialect='excel')
# 		# write the csv header
# 		csv_writer.writerow(('ip', 'ezproxy_session', 'patron_username', 'iso_timestamp', 'completed_request', 'http_status', 'num_bytes'))
		
# 		for line in data_input:
# 			x = ezproxy_regex.search(line)
# 			if (x != None):
# 				x_groups = list(x.groups())
# 				# convert the time string to ISO format (replace the first ':' with a ' ')
# 				x_groups[3] = parser.parse(x_groups[3].replace(':', ' ', 1)).isoformat()
# 				csv_writer.writerow(x_groups)
............................................................................
Processed lines: 769456
In [2]:
# this is just to obfuscate the usernames for display later
from random import sample, randint
import nouns
import adjectives

def gen_nickname():
    name = ''.join(sample(adjectives.ADJECTIVES, 1)) + '-' + \
        ''.join(sample(nouns.ANIMALS, 1)).lower() + \
        '-' + str(randint(1,100))
    return name
    
cursor = db.cursor()
sql = """
SELECT
DISTINCT data.patron_username
FROM
data
"""
results = cursor.execute(sql)

sql = """
INSERT INTO patron_nicknames(`patron_username`, `nickname`) VALUES (?, ?)
"""
for result in results:
#     insert_values = (result[0], gen_nickname() )
#     print(insert_values)
    db.execute(sql, (result[0], gen_nickname()))

cursor.close()
db.commit()

Let's Aggregate!

To start looking at the data and get a sense of what's going on, the following queries on the table are useful for determining usage during time periods / by user.

In [3]:
sql = """
SELECT
COUNT(*) AS count,
MIN(data.iso_timestamp) AS min_date,
MAX(data.iso_timestamp) AS max_date,
COUNT( DISTINCT data.patron_username) as count_users,
COUNT( DISTINCT data.ezproxy_session ) as count_sessions,
SUM(num_bytes) / 1048576 AS sum_mebibyte

FROM
data
"""

cursor = db.cursor()
cursor.execute(sql)
print(cursor.fetchone())
cursor.close()
(769456, '2019-07-29T00:00:03-04:00', '2019-08-04T23:59:28-04:00', 741, 1691, 1311984)
In [4]:
# ... or, if we want to get fancy, we can do this to access the columns by name!
# https://docs.python.org/3.7/library/sqlite3.html#accessing-columns-by-name-instead-of-by-index
cursor = db.cursor()
cursor.execute(sql)
cursor.row_factory = sqlite3.Row

result = cursor.fetchone()

print('count rows:\t\t\t{}\n\
min_date:\t\t\t{}\n\
max_date:\t\t\t{}\n\
count distinct users:\t\t{}\n\
count distinct sessions:\t{} ({:.2f} sessions per user)\n\
sum mebibytes transferred:\t{} ({:.2f} Terabytes)'\
      .format(result['count'], \
              result['min_date'], \
              result['max_date'], \
              result['count_users'], \
              result['count_sessions'],\
              result['count_sessions']/result['count_users'],\
              result['sum_mebibyte'], result['sum_mebibyte']/953674))
count rows:			769456
min_date:			2019-07-29T00:00:03-04:00
max_date:			2019-08-04T23:59:28-04:00
count distinct users:		741
count distinct sessions:	1691 (2.28 sessions per user)
sum mebibytes transferred:	1311984 (1.38 Terabytes)

... BTW, a cool trick to print a float and limit to 2 digits: use the Python string format() method:

https://docs.python.org/3.7/library/string.html#formatspec

from math import pi
print(pi)
# 3.141592653589793
print('{:.2f}'.format(pi))
# 3.14
In [5]:
# -- find sum of data downloaded by date, ezproxy_session, and patron_username
# -- order the results by date, and amount downloaded per aggregated group
# -- filter out sessions with fewer than 10 mebibytes total
sql = """
SELECT
date(iso_timestamp) as date,
ezproxy_session,
-- patron_username,
nickname,
(SUM(num_bytes) / 1048576) AS sum_mebibyte

FROM
data

JOIN
patron_nicknames
ON
  patron_nicknames.patron_username = data.patron_username

GROUP BY
date,
ezproxy_session,
-- patron_username
nickname

HAVING
sum_mebibyte >= 10

ORDER BY
date,
sum_mebibyte DESC
"""

results = db.execute(sql)
for result in results:
    print(result)
('2019-07-29', 'ub1rikHQpJGhbuI', 'irritable-python-83', 287763)
('2019-07-29', 'ihpfD2saD0R0Fys', 'anxious-dolphin-79', 354)
('2019-07-29', 'ZoBTn48LQyeXVv6', 'irritable-python-83', 183)
('2019-07-29', '96trHuqgkkE6ZIp', 'huge-aardvark-30', 130)
('2019-07-29', '4leEMztqqtEHyyQ', 'harebrained-antelope-43', 126)
('2019-07-29', 're7VTXNUx1IztsZ', 'ludicrous-newt-61', 102)
('2019-07-29', 'mM24tMx7QgaaGcO', 'huge-aardvark-30', 97)
('2019-07-29', '2gAi3GF4rIbQttP', 'clumsy-lobster-88', 93)
('2019-07-29', 'dMKQi9TgcZJzZ7S', 'loose-penguin-81', 81)
('2019-07-29', 'EkPddWL7jDLkAG8', 'clear-mollusk-96', 72)
('2019-07-29', '-', 'rotund-tapir-94', 55)
('2019-07-29', 'nLF6A6IGOhvJEDJ', 'jealous-lemur-97', 44)
('2019-07-29', 'AmfuR0Ll17rvmeY', 'perfect-wallaby-31', 40)
('2019-07-29', 'egho3oWl9iR6f7B', 'brave-sandpiper-45', 38)
('2019-07-29', 'L9hsUBFCjYG5ZqL', 'cheeky-yak-69', 37)
('2019-07-29', 'EEET6ehSPt5u3Qb', 'hurt-turtle-12', 35)
('2019-07-29', 'kmWFPYxtXzekWYZ', 'uptight-starling-47', 33)
('2019-07-29', 'liZJm0I0Krjeh19', 'impressionable-porcupine-69', 26)
('2019-07-29', 'pY9HTalYsAZ28ZP', 'aloof-anteater-31', 26)
('2019-07-29', '07FcWmiwFGgwdk5', 'beefy-catfish-73', 24)
('2019-07-29', '8XaXyoxN7BnaUzs', 'glorious-lobster-14', 22)
('2019-07-29', 'TkBjwBapdiIplKv', 'abrupt-alpaca-45', 22)
('2019-07-29', 'pIIiiSQK2SZ2s7U', 'slimy-dragonfly-43', 21)
('2019-07-29', 'p6huJ6vZep6uv5j', 'gigantic-mongoose-64', 20)
('2019-07-29', 'A734WvGClP4HRdW', 'silly-gnu-68', 19)
('2019-07-29', 'UDruXBOs46OKDGi', 'loose-penguin-81', 19)
('2019-07-29', 'eKTYsMhUKHMEeRX', 'smug-seal-11', 18)
('2019-07-29', 'VDQDMUEpd2pMFkJ', 'fluttering-loris-26', 16)
('2019-07-29', '21GnaiWwcqhC7w5', 'ludicrous-newt-61', 15)
('2019-07-29', '3YJaeNkkgY4ci0A', 'sarcastic-gnu-23', 15)
('2019-07-29', 'axCtd3yxRYmLsRq', 'responsive-pelican-71', 15)
('2019-07-29', 'jmBjdtIpIOqfTSf', 'gleaming-goldfish-21', 15)
('2019-07-29', 'H08F1I3wMFNsEXc', 'melancholy-porcupine-22', 14)
('2019-07-29', 'vR5ufnV78iquFjq', 'giddy-salmon-59', 13)
('2019-07-29', '8CMrb2sSR1aaNf7', 'mysterious-leopard-41', 12)
('2019-07-29', 'zKFhF5loBMMmuG8', 'silky-hummingbird-76', 12)
('2019-07-29', 'mnySa4IBO7bKM8p', 'ratty-ram-65', 11)
('2019-07-29', '7FzGhzBOqEFJiGd', 'petty-mouse-1', 10)
('2019-07-29', 'YjA6uGFVVIY1xOU', 'cumbersome-flamingo-36', 10)
('2019-07-30', 'ub1rikHQpJGhbuI', 'irritable-python-83', 381953)
('2019-07-30', 'C1wMrmlqh1KbhtZ', 'anxious-dolphin-79', 1039)
('2019-07-30', '0Sp7WKrDqngF93W', 'clear-mollusk-96', 224)
('2019-07-30', '4leEMztqqtEHyyQ', 'harebrained-antelope-43', 221)
('2019-07-30', 'nlTcQmga5rkY0FY', 'puny-jellyfish-2', 221)
('2019-07-30', '8bVdgnSunFHqwEU', 'bland-seahorse-42', 182)
('2019-07-30', 'MAMbZ0i4QatYazp', 'clumsy-wrasse-4', 133)
('2019-07-30', '5vOpLaApiULGWy1', 'bland-seahorse-42', 131)
('2019-07-30', 'ZjIfuk2rhoqYJMf', 'ratty-coelacanth-3', 106)
('2019-07-30', 'KEx0oqER7Zr3gRC', 'puny-jellyfish-2', 103)
('2019-07-30', 'LXta0lMPIlfUXuN', 'steep-catfish-1', 100)
('2019-07-30', 'qYHpvQmoP6JXy0P', 'clumsy-wrasse-4', 71)
('2019-07-30', '-', 'rotund-tapir-94', 65)
('2019-07-30', 'dOsZXeYmEoYWsNW', 'silky-hummingbird-76', 65)
('2019-07-30', '72UUlvqfdCbsLtr', 'stunning-platypus-15', 64)
('2019-07-30', 'm3PolX0HtnDRkXS', 'ratty-coelacanth-3', 62)
('2019-07-30', 'OMxBTZvWa9ByMmL', 'victorious-mandrill-33', 61)
('2019-07-30', '9hDDOATjWFsaU0x', 'steep-catfish-1', 59)
('2019-07-30', 'ofPnia8KacC5KYJ', 'irate-eagle-93', 59)
('2019-07-30', '09QOFhsV156Ix4B', 'impressionable-porcupine-69', 55)
('2019-07-30', '4hkRihV0VRkzvp7', 'irate-eagle-93', 49)
('2019-07-30', 'KPBL3Xb0eiAeb3k', 'ratty-coelacanth-3', 43)
('2019-07-30', 'POq7KzSZ7e324nT', 'slimy-mallard-75', 42)
('2019-07-30', 'QTfOqxMuOhEj5ex', 'gentle-orca-8', 33)
('2019-07-30', 'yeEVasVA2cypVNV', 'huge-aardvark-30', 29)
('2019-07-30', 'PhM7lGY8bGlgODj', 'deceitful-aardvark-32', 26)
('2019-07-30', 'NAbrikNY9JXoEva', 'sweet-raven-51', 24)
('2019-07-30', '4h5d1GhX4Nd16uu', 'ratty-coelacanth-3', 23)
('2019-07-30', 'ryoRRgKSI5nf5Uv', 'gentle-orca-8', 23)
('2019-07-30', 'DqWquVwWLw5YicX', 'lethal-porcupine-46', 20)
('2019-07-30', '1gubzIB3AvxiSc2', 'ratty-coelacanth-3', 19)
('2019-07-30', 'SJuY1Xni8CnKlds', 'bewildered-songbird-23', 19)
('2019-07-30', '0fabCJqOCslWRWh', 'gaudy-narwhal-33', 17)
('2019-07-30', '8IoIGz87WdZEGj9', 'outrageous-finch-64', 17)
('2019-07-30', 'UVsjSlSYZSVrPP6', 'gentle-orca-8', 17)
('2019-07-30', 'ikt0OHyONIHJLDP', 'sticky-sparrow-53', 17)
('2019-07-30', 'mOCO8yazjmk2bet', 'mistaken-kangaroo-99', 17)
('2019-07-30', 'I4fKkzivedWZxOL', 'selfish-dolphin-93', 15)
('2019-07-30', 'U8djgpAKLuqvCDD', 'virtuous-cassowary-71', 15)
('2019-07-30', 'b6dhIokTwEA9VCt', 'ratty-ram-65', 15)
('2019-07-30', 'OFu7GgEnKdNo0zQ', 'costly-zebra-86', 14)
('2019-07-30', 'XldirNv2Ub4c43U', 'gleaming-goldfish-21', 14)
('2019-07-30', '5qno31n7wVZWJwO', 'gorgeous-duck-46', 13)
('2019-07-30', 'D5rI9qoNrLOVSaJ', 'ghastly-tapir-84', 13)
('2019-07-30', 'b73yVKieZnnMc68', 'despicable-chameleon-89', 13)
('2019-07-30', 'vdJInwl4Gn93LFV', 'cumbersome-flamingo-36', 13)
('2019-07-30', 'APm4QSRF937ApNI', 'hurt-turtle-12', 12)
('2019-07-30', 'ptXZ2moG15WgY5m', 'odd-finch-10', 12)
('2019-07-30', 'uvHZYIcuF52K7ox', 'slimy-dragonfly-43', 12)
('2019-07-30', '7uONyOmqNqZUisE', 'immense-giraffe-70', 11)
('2019-07-30', 'NFgL0V3mPFhLn4p', 'succulent-ram-43', 11)
('2019-07-30', '3w3Klgvarsjk1Mw', 'responsive-wolf-53', 10)
('2019-07-30', 'yU209QMn1KmyVSo', 'dilapidated-ibex-47', 10)
('2019-07-31', 'ub1rikHQpJGhbuI', 'irritable-python-83', 373839)
('2019-07-31', 'QekDtDu8QFpN4mx', 'irritable-sandpiper-8', 374)
('2019-07-31', 'OL9CInusNIgdyme', 'clumsy-wrasse-4', 180)
('2019-07-31', 'bmq6UDw51jAZpxm', 'gritty-opossum-52', 171)
('2019-07-31', 'Z18LvMdfIKLOME6', 'clumsy-wrasse-4', 95)
('2019-07-31', 'tuVYkqRm6HA2a5f', 'slimy-jackal-14', 94)
('2019-07-31', 'Fgii83AjZt2U5Us', 'impressionable-porcupine-69', 92)
('2019-07-31', 'GMvRHfw7E0zA5q7', 'soggy-dalmatian-90', 91)
('2019-07-31', 'lzBHJ0RVKc2s4k2', 'harebrained-antelope-43', 83)
('2019-07-31', 'Z3jB4871xh5TLUe', 'huge-aardvark-30', 71)
('2019-07-31', 'x5o5uWqW7NHkLWw', 'hurt-turtle-12', 69)
('2019-07-31', '-', 'rotund-tapir-94', 67)
('2019-07-31', 'l6uqRxP5jezJLIg', 'small-kingfisher-4', 53)
('2019-07-31', 'vSiRWVZWdn8fdyg', 'upset-spider-75', 51)
('2019-07-31', 'j9D6OQgY9oGggHl', 'small-kingfisher-4', 49)
('2019-07-31', 'uDZSPRORIn2w8F5', 'irritable-panda-97', 46)
('2019-07-31', 'an5AC6nhzyVH8nd', 'healthy-wren-44', 33)
('2019-07-31', 'TDcvlSrD6YHvU1i', 'irritable-python-83', 32)
('2019-07-31', 'MFnaWnvyniN2kLy', 'bewildered-songbird-23', 29)
('2019-07-31', 'ie9Vyv5ZYYxO7J1', 'responsive-wolf-53', 28)
('2019-07-31', 'cx6HON9yj5pfSs0', 'small-kingfisher-4', 24)
('2019-07-31', '5gLHw1QlIAmXGhd', 'melancholy-giraffe-23', 21)
('2019-07-31', 'A6I5nK7iyWyRfiu', 'gorgeous-duck-46', 21)
('2019-07-31', 'PrUqLb41Cua9P4T', 'irritable-sandpiper-8', 21)
('2019-07-31', 'vVk99tUHJO8TyNc', 'ratty-ram-65', 21)
('2019-07-31', 'Rbz3ApUHUHGfrpi', 'ludicrous-rook-15', 19)
('2019-07-31', 'lw6Ju0G3GLZZ1W9', 'deceitful-aardvark-32', 19)
('2019-07-31', '9gvYfcsdTfakeGX', 'lethal-osprey-49', 17)
('2019-07-31', 'weXJaw17UaIHX8q', 'slimy-dragonfly-43', 15)
('2019-07-31', 'aaQ4CF6AjwiVX9P', 'emaciated-finch-77', 14)
('2019-07-31', 'ChepiUr50O3vX5q', 'arrogant-heron-69', 13)
('2019-07-31', 'O8m2V0bCk5wDxOI', 'responsive-wolf-53', 13)
('2019-07-31', 'jbcA57bDBXV7EPL', 'zany-lion-46', 13)
('2019-07-31', 'vVGkUqUqL8dCIAF', 'hurt-duck-70', 13)
('2019-07-31', '8xTjHmMPRvObSJ4', 'shaky-roadrunner-93', 12)
('2019-07-31', '4ZJZFacvKcl0tpt', 'defiant-triceratops-15', 11)
('2019-07-31', 'Dr5em1CHoVQ6PqB', 'grumpy-cobra-19', 11)
('2019-07-31', 'RQNvkmtTIloFKJq', 'crooked-nightingale-24', 11)
('2019-07-31', 'fFScIZVWRttzk7B', 'mammoth-rhinoceros-15', 11)
('2019-07-31', 'UNuwcs1cDkzryuq', 'despicable-osprey-88', 10)
('2019-07-31', 'UpN12oPoMIf7aAl', 'giddy-salmon-59', 10)
('2019-07-31', 'krExN4ewMz0c06u', 'convincing-newt-89', 10)
('2019-08-01', 'ub1rikHQpJGhbuI', 'irritable-python-83', 204458)
('2019-08-01', '3l89JaBDN4lgc05', 'irritable-python-83', 27457)
('2019-08-01', 'Az4aLrXDbInUEQE', 'irritable-python-83', 13733)
('2019-08-01', 'WVX4vJl2QMz51bI', 'irritable-python-83', 2427)
('2019-08-01', 'mFPtAlDg8J41DEG', 'arrogant-platypus-75', 204)
('2019-08-01', 'we8HS4PihohIeio', 'maniacal-walrus-57', 180)
('2019-08-01', 'xR9Wdct7K55HotO', 'weary-nautilus-28', 160)
('2019-08-01', '5mWgXk97GonRyoS', 'impressionable-porcupine-69', 155)
('2019-08-01', '87SNwDzFdRzHs0h', 'gritty-opossum-52', 142)
('2019-08-01', 'RipFMxKFJFJ5gB9', 'healthy-sheep-72', 123)
('2019-08-01', '-', 'rotund-tapir-94', 98)
('2019-08-01', 'Ak02877QC5p60jO', 'small-kingfisher-4', 79)
('2019-08-01', '4rBjpu8twN7yatu', 'huge-aardvark-30', 71)
('2019-08-01', 'lzBHJ0RVKc2s4k2', 'harebrained-antelope-43', 61)
('2019-08-01', 'oyvLlblFCTkhsGV', 'round-crab-93', 60)
('2019-08-01', 'MRV0JPc5mGJYpeA', 'healthy-sheep-72', 57)
('2019-08-01', 'gOWMseVosqZgDcX', 'defiant-angelfish-2', 52)
('2019-08-01', 'hb6M9tZ2c2RmexZ', 'hurt-turtle-12', 47)
('2019-08-01', 'b0IdhTBoFBf47KC', 'plain-gull-17', 36)
('2019-08-01', 'sMwWL9pLRKKudkt', 'arrogant-platypus-75', 35)
('2019-08-01', 'dT4VDaxXjwnsaAi', 'bewildered-songbird-23', 32)
('2019-08-01', 'MFnaWnvyniN2kLy', 'bewildered-songbird-23', 31)
('2019-08-01', 'KPBOzVKUf3QLzIN', 'responsive-wolf-53', 30)
('2019-08-01', 'yCf4bkFe5Vt1bXr', 'hurt-turtle-12', 30)
('2019-08-01', '4kJBOmYLLyMMoHK', 'lethal-porcupine-46', 28)
('2019-08-01', 'A3uppd0mazmD43b', 'gigantic-mongoose-64', 27)
('2019-08-01', 'SpEzy91CRLH6JMx', 'scary-duck-96', 27)
('2019-08-01', 'h7zGXD8i2AMrUzL', 'clumsy-wrasse-4', 26)
('2019-08-01', 'FVbN3P9JNNwn9MI', 'jumpy-falcon-100', 24)
('2019-08-01', 'qgL9VRTe2mwRGdM', 'helpless-goose-8', 24)
('2019-08-01', '2j6tBlz7khdJvKc', 'steep-catfish-1', 21)
('2019-08-01', 'BnugC88kk4PCspM', 'ratty-ram-65', 17)
('2019-08-01', 'bP22VLo2QLZ1iZu', 'sore-starfish-49', 15)
('2019-08-01', 'TroEc3ufAFhvqYW', 'friendly-cat-38', 14)
('2019-08-01', 'ZPfqG5seZSqJWun', 'attractive-aardvark-33', 14)
('2019-08-01', 'oDoiZTkv1cd7Jtj', 'frightened-hatchetfish-99', 14)
('2019-08-01', 't0oQK2rMW1i6eFQ', 'arrogant-heron-69', 14)
('2019-08-01', 'iP3PNYYB7OtMxU6', 'drained-angelfish-63', 13)
('2019-08-01', '3WEuC2RjoiDQGkX', 'distinct-newt-87', 12)
('2019-08-01', 'FPdxsC7Nye2NHBN', 'outrageous-finch-64', 12)
('2019-08-01', 'OLKa01xkUmK8YWY', 'dangerous-cobra-88', 12)
('2019-08-01', 'pH97sUhcpgX6JQ5', 'sore-dragonfish-56', 12)
('2019-08-01', 'qoL5J6Jg0aptGCL', 'ratty-ram-65', 12)
('2019-08-01', 'zG8DM0TNkQVHaaG', 'gleaming-bat-80', 12)
('2019-08-01', 'Z032QJhT5gKggWP', 'ideal-raccoon-69', 11)
('2019-08-01', 'l55d2cIbQHUIOnb', 'shaky-roadrunner-93', 10)
('2019-08-01', 'xf1X5BkA0ntsJb7', 'arrogant-heron-69', 10)
('2019-08-02', 'iA0rRKgGWN7YnMP', 'gentle-otter-66', 330)
('2019-08-02', 'r2I4ugjPZq49EkZ', 'itchy-yak-78', 238)
('2019-08-02', 'mFPtAlDg8J41DEG', 'arrogant-platypus-75', 227)
('2019-08-02', 'p21JZGpnI8L7pF0', 'fuzzy-snowy owl-92', 224)
('2019-08-02', 'rc83erxP7S9TJJS', 'gritty-opossum-52', 169)
('2019-08-02', 'E3X2RNl3gIB7anu', 'beefy-gorilla-41', 155)
('2019-08-02', 'X1zA0OmYUEq0iRq', 'maniacal-walrus-57', 122)
('2019-08-02', 'zSxsEs709kfeYsa', 'envious-bulldog-91', 113)
('2019-08-02', 'dEh0qO5gwUf7Xn7', 'ecstatic-deer-41', 93)
('2019-08-02', 'CxYigYXkNyeAGIH', 'agitated-python-29', 92)
('2019-08-02', 'HjKdOpc1bPWxCf6', 'healthy-koi-26', 91)
('2019-08-02', 'TEw1ADy46Rdho6v', 'gorgeous-mouse-10', 91)
('2019-08-02', 'VqBOhK7lLvDeagf', 'immense-angelfish-32', 85)
('2019-08-02', 'K0j5i9c3x6wSEJH', 'healthy-sheep-72', 77)
('2019-08-02', 'b6NjMwOdU57sSPM', 'bewildered-songbird-23', 74)
('2019-08-02', '-', 'rotund-tapir-94', 69)
('2019-08-02', '6x51bxN1Ymci7Yr', 'impressionable-porcupine-69', 62)
('2019-08-02', 'UBKB0M1fbzxef7M', 'gentle-otter-66', 61)
('2019-08-02', '3ALnHvZ1fgeNiqe', 'round-triceratops-99', 58)
('2019-08-02', 'PnL2Nbb6UkXW5bU', 'huge-aardvark-30', 48)
('2019-08-02', 'v2IzKNfY5gC74gl', 'stunning-owl-73', 43)
('2019-08-02', 'dHEGDjUDFqIe25R', 'envious-bulldog-91', 42)
('2019-08-02', 'CIInzVNJOgfnnKb', 'teeny-caribou-66', 41)
('2019-08-02', 'qWgRu0W3hyRZPqs', 'puny-jellyfish-2', 41)
('2019-08-02', 'PGEF4imv2vXv1bJ', 'hurt-turtle-12', 36)
('2019-08-02', '4OMhFLZgn61JFry', 'elegant-marlin-49', 31)
('2019-08-02', 'TT48MsQrzGyjiEt', 'terrible-osprey-14', 31)
('2019-08-02', 'XUYpSEi8L9ZlhoD', 'hurt-turtle-12', 29)
('2019-08-02', 'YABH7MH1YJTcJIq', 'beefy-gorilla-41', 28)
('2019-08-02', 'SSjtzpsgO7SZknn', 'lethal-osprey-49', 25)
('2019-08-02', 'YrW937NjpfvdNeu', 'responsive-wolf-53', 25)
('2019-08-02', 'jhN8pqTBjpL4rA6', 'homely-newt-23', 25)
('2019-08-02', '5bNWcmDJ7f1Fm8H', 'sticky-herring-14', 23)
('2019-08-02', '2juzh2kKJRfYbr2', 'clumsy-wrasse-4', 21)
('2019-08-02', 'vSgzDHUY3iQ8fvp', 'cumbersome-flamingo-36', 21)
('2019-08-02', '5K9iP5CMnyauJpc', 'bewildered-songbird-23', 17)
('2019-08-02', 'SSHIJ2k3tvLj529', 'ratty-butterfly-26', 16)
('2019-08-02', 'PniLslOGNtFAYUs', 'bewildered-songbird-23', 15)
('2019-08-02', '8MWOF4rsSDZiN5K', 'troubled-centipede-1', 14)
('2019-08-02', 'AxcVAyrcI86WqDl', 'apprehensive-kangaroo-19', 14)
('2019-08-02', 'AJuAKCPB37Fl9Zh', 'burly-lark-28', 13)
('2019-08-02', 'AdURN6UUIUhMeiu', 'gleaming-goldfish-21', 13)
('2019-08-02', 'yp3KRclwIDy9PTf', 'gorgeous-mouse-10', 13)
('2019-08-02', 'zTD6JJGqmZ87Y1j', 'round-triceratops-99', 13)
('2019-08-02', '3SQEhb0W7uITqph', 'ratty-ram-65', 11)
('2019-08-02', 'C1gbBqybbT8QUww', 'grieving-panda-22', 11)
('2019-08-02', 'DTm8SFAV5rdOS1K', 'loose-cassowary-2', 11)
('2019-08-02', 'G7GGpPo7CMt1zMA', 'gleaming-bat-10', 11)
('2019-08-03', 'kNtkjIrxcUQLgEN', 'bright-finch-61', 294)
('2019-08-03', '9bB7Og9zz8V4DMQ', 'harebrained-antelope-43', 193)
('2019-08-03', 'rf7Ja6emuIf5CGu', 'huge-aardvark-30', 118)
('2019-08-03', 'jRPh9mSawSR6nh1', 'despicable-chameleon-89', 110)
('2019-08-03', 'muxHT1EsyjRyeP5', 'puny-jellyfish-2', 89)
('2019-08-03', '0RtBXhyD3nBK97k', 'grumpy-anteater-78', 78)
('2019-08-03', '9ZOTH5pMg76poJw', 'lethal-porcupine-46', 74)
('2019-08-03', 'LQHl17q0cMAEwRQ', 'puny-jellyfish-2', 73)
('2019-08-03', 'q3NXEe2O45Kf27P', 'cumbersome-flamingo-36', 64)
('2019-08-03', 'JSbmbZZLhGjpEBn', 'poised-sandpiper-49', 63)
('2019-08-03', 'mEL93v8el0TwJ7b', 'hurt-turtle-12', 47)
('2019-08-03', 'Xm3Es6uENDsGbil', 'diminutive-dove-13', 45)
('2019-08-03', 'kgmdcWxNJeINgOQ', 'puny-jellyfish-2', 44)
('2019-08-03', '0PvDGXxXIwHfi46', 'drained-angelfish-63', 42)
('2019-08-03', '-', 'rotund-tapir-94', 41)
('2019-08-03', 'G7t7gxDjceeUCiQ', 'beefy-gorilla-41', 37)
('2019-08-03', 'hb6wurjFdef7h9b', 'grumpy-anteater-78', 33)
('2019-08-03', 'J3Karbjds6ffyif', 'agitated-mandrill-64', 30)
('2019-08-03', 'GjrH5I0NDyxWYY6', 'sarcastic-gnu-23', 29)
('2019-08-03', 'qXM86oLuq7JkVnW', 'troubled-centipede-1', 29)
('2019-08-03', '5bNWcmDJ7f1Fm8H', 'sticky-herring-14', 27)
('2019-08-03', 'D6D2QKNa6K8lrTE', 'grumpy-anteater-78', 27)
('2019-08-03', 'PIYv2zvpVR2v5w5', 'beefy-gorilla-41', 27)
('2019-08-03', 'tQP5vBstda2RT4C', 'bewildered-songbird-23', 27)
('2019-08-03', 'Ea9OpjcN4nUPFZW', 'perplexed-stork-27', 25)
('2019-08-03', 'EuU02BJKrvIxdiF', 'lethal-porcupine-46', 25)
('2019-08-03', 'vOVBrT1eC0QkRqb', 'jittery-penguin-73', 25)
('2019-08-03', 'w65vW9ir7FGLYmx', 'bewildered-songbird-23', 25)
('2019-08-03', 'Dr7g7PoOUaEswIt', 'strange-hummingbird-81', 19)
('2019-08-03', '0aheLLYmuYZVdeY', 'puny-jellyfish-2', 16)
('2019-08-03', '3dJdUSpy7hRQeXk', 'lethal-porcupine-46', 16)
('2019-08-03', 'BTVROfUUz4GlSpj', 'yummy-mastiff-29', 15)
('2019-08-03', 'B9PH5o0LDrakQDZ', 'helpful-newt-59', 12)
('2019-08-03', 'SwMvP47ePyEYYqR', 'slippery-mongoose-60', 12)
('2019-08-03', 'TP9M5tSqIFAe902', 'outrageous-finch-64', 12)
('2019-08-03', 'cum5GWzMEY5O4a7', 'succulent-seal-46', 12)
('2019-08-03', 'ikvBFH8LQbJ2KTG', 'slippery-mongoose-60', 12)
('2019-08-03', 'CH92OGNR8llziLW', 'courageous-coelacanth-58', 11)
('2019-08-03', 'UbaEvE1pau6OVRi', 'soggy-dalmatian-90', 11)
('2019-08-03', 'eRvVZ5gDpqOI5QE', 'sarcastic-gnu-23', 11)
('2019-08-03', 'igKhQ2kfmvQuGJf', 'spotless-grouse-20', 11)
('2019-08-03', 'KiaeSxUZNQedhKG', 'bewildered-songbird-23', 10)
('2019-08-03', 'T8cb9I3AYles3Bd', 'helpful-newt-59', 10)
('2019-08-03', 'VnhGCTmu8eUkxVK', 'responsive-wolf-53', 10)
('2019-08-03', 'dro6mF8z8dqbJHq', 'blushing-salmon-88', 10)
('2019-08-04', '5oRUYUZJet1Cf40', 'corny-rhinoceros-73', 469)
('2019-08-04', 'b6WX7yMWG59ofBp', 'annoyed-nightingale-5', 217)
('2019-08-04', 'kWNlqXEbuQ10KWS', 'huge-aardvark-30', 142)
('2019-08-04', 'i6xPVZBQJ5VSVEh', 'uptight-starling-47', 126)
('2019-08-04', 'Wr336NmM5yDcbPm', 'contemplative-jellyfish-91', 111)
('2019-08-04', 'JjHq4kDXDMKmWze', 'huge-aardvark-30', 105)
('2019-08-04', 'yp7eAq3iPKs8mci', 'harebrained-antelope-43', 96)
('2019-08-04', 'K1qbDwVyE7PK1u9', 'defiant-angelfish-2', 80)
('2019-08-04', 'XuXnkpu6cs2SFil', 'bewildered-songbird-23', 64)
('2019-08-04', 'Xm3Es6uENDsGbil', 'diminutive-dove-13', 53)
('2019-08-04', 'EDsxJJJgOGAzk9K', 'sarcastic-gnu-23', 50)
('2019-08-04', 'PHD99Eosz0b18YQ', 'outrageous-horse-15', 40)
('2019-08-04', '5x1i4oftogUEqgv', 'broad-swan-71', 39)
('2019-08-04', '-', 'rotund-tapir-94', 37)
('2019-08-04', '48M4al6n3v7stwa', 'harebrained-antelope-43', 37)
('2019-08-04', 'hl3b9KNwZonnZbm', 'huge-aardvark-30', 35)
('2019-08-04', '382bpCwkFK9PfWU', 'poised-sandpiper-49', 33)
('2019-08-04', 'C8txvUKrOqEpmZH', 'beefy-gorilla-41', 29)
('2019-08-04', 'IdiYHu3Wkfnbt8r', 'bewildered-songbird-23', 29)
('2019-08-04', 'cum5GWzMEY5O4a7', 'succulent-seal-46', 27)
('2019-08-04', 'McXqhRebH0MehlM', 'wicked-antelope-94', 26)
('2019-08-04', 'xXsQUG3Yk4AwNYQ', 'ashamed-wombat-5', 26)
('2019-08-04', 'SGY5ZckViNPBIlg', 'huge-aardvark-30', 23)
('2019-08-04', 'bDrFlY9QmiNjDHR', 'tender-gnu-47', 23)
('2019-08-04', 'wl7yY4bf8pRDt8y', 'condescending-chipmunk-26', 23)
('2019-08-04', '1KIlv044zGGbh6i', 'bewildered-magpie-94', 22)
('2019-08-04', 'AUTrJWtJNdSv0Gh', 'responsive-wolf-53', 20)
('2019-08-04', 'RE0uv9gX7cnblq5', 'irritable-panda-97', 18)
('2019-08-04', 'SWZvcSMB4OlSjK8', 'responsive-dragonfly-47', 17)
('2019-08-04', '4dyOvMLFxUtoyOh', 'bewildered-songbird-23', 16)
('2019-08-04', 'sa5fYm3xMq73egp', 'lethal-porcupine-46', 16)
('2019-08-04', '441b2xiVpETWX9T', 'bewildered-songbird-23', 14)
('2019-08-04', 'eqCFoVj9LMA3h4y', 'lethal-porcupine-46', 11)
('2019-08-05', 'bT4lbzuxXDqahdO', 'stunning-platypus-15', 80)
('2019-08-05', '5x1i4oftogUEqgv', 'broad-swan-71', 64)
('2019-08-05', 'RnrFm2YLiZUcSbY', 'harebrained-heron-25', 64)
('2019-08-05', '32WD4TRRPzvMObx', 'irritable-panda-97', 49)
('2019-08-05', '8jV09Nk5VSuuoTw', 'irritable-panda-97', 24)
('2019-08-05', 'sJ0V7TK4j2R871S', 'gleaming-goldfish-21', 21)
('2019-08-05', '2Un7vZlnWohEA60', 'bewildered-songbird-23', 18)
('2019-08-05', 'kRduF25932Y9kZR', 'proud-butterfly-65', 18)
In [6]:
# -- target a specific resource name and specific date range per user
# -- (top 10 users by amount of data transferred)
sql = """
SELECT
nickname,
(SUM(data.num_bytes) / 1048576) AS sum_mebibyte

FROM
data

JOIN
patron_nicknames
ON
  patron_nicknames.patron_username = data.patron_username

WHERE
data.completed_request LIKE '%proquest.com%'

GROUP BY
nickname

ORDER BY
sum_mebibyte DESC

LIMIT 10
"""

results = db.execute(sql)
for result in results:
    print(result)

# we'll store the first in that list as the `target` to use in the query below...
cursor = db.cursor()
cursor.execute(sql)
target = cursor.fetchone()[0]
cursor.close()
('irritable-python-83', 1291849)
('anxious-dolphin-79', 1393)
('harebrained-antelope-43', 820)
('bewildered-songbird-23', 472)
('arrogant-platypus-75', 467)
('gentle-otter-66', 392)
('puny-jellyfish-2', 388)
('gritty-opossum-52', 327)
('clear-mollusk-96', 306)
('maniacal-walrus-57', 302)
In [7]:
# how about we target the username over a 2 day span
sql = """
SELECT
-- strftime ('%Y-%m-%d %H', data.iso_timestamp) AS hour,
strftime ('%d %H', data.iso_timestamp) AS hour,
(SUM(data.num_bytes) / 1048576) AS sum_mebibyte

FROM
data

JOIN
patron_nicknames
ON
  patron_nicknames.patron_username = data.patron_username

----
-- CHANGE THIS TO THE USER MATCHING THE TOP RESULT ABOVE
----
WHERE
nickname = ?
AND date(data.iso_timestamp) >= '2019-07-29'
AND date(data.iso_timestamp) <= '2019-07-30'

GROUP BY
hour

ORDER BY
hour
"""
print(target + '\n---')
results = db.execute(sql, [(target)])
for result in results:
    print(result)
irritable-python-83
---
('29 04', 13361)
('29 05', 13679)
('29 06', 13067)
('29 07', 13891)
('29 08', 14832)
('29 09', 14689)
('29 10', 14057)
('29 11', 14508)
('29 12', 14422)
('29 13', 14461)
('29 14', 14411)
('29 15', 14084)
('29 16', 14108)
('29 17', 14832)
('29 18', 14727)
('29 19', 14779)
('29 20', 14884)
('29 21', 14794)
('29 22', 15096)
('29 23', 15254)
('30 00', 15632)
('30 01', 15469)
('30 02', 16205)
('30 03', 15664)
('30 04', 16047)
('30 05', 15923)
('30 06', 16120)
('30 07', 16260)
('30 08', 16063)
('30 09', 16234)
('30 10', 15614)
('30 11', 16110)
('30 12', 15822)
('30 13', 15925)
('30 14', 15798)
('30 15', 15792)
('30 16', 16030)
('30 17', 15973)
('30 18', 16081)
('30 19', 15990)
('30 20', 15848)
('30 21', 15754)
('30 22', 15868)
('30 23', 15719)

SQLite3 + Pandas??? ...

thumbsup

In [8]:
# let's read the results of this query into a pandas dataframe...
import pandas as pd
import numpy as np

# we can use the SQL query we previously had. 
# Note: we set the `params` argument to to pass the paramaters to the SQL query ...
df = pd.read_sql_query(sql, db, params=[(target)])
df.describe
Out[8]:
<bound method NDFrame.describe of      hour  sum_mebibyte
0   29 04         13361
1   29 05         13679
2   29 06         13067
3   29 07         13891
4   29 08         14832
5   29 09         14689
6   29 10         14057
7   29 11         14508
8   29 12         14422
9   29 13         14461
10  29 14         14411
11  29 15         14084
12  29 16         14108
13  29 17         14832
14  29 18         14727
15  29 19         14779
16  29 20         14884
17  29 21         14794
18  29 22         15096
19  29 23         15254
20  30 00         15632
21  30 01         15469
22  30 02         16205
23  30 03         15664
24  30 04         16047
25  30 05         15923
26  30 06         16120
27  30 07         16260
28  30 08         16063
29  30 09         16234
30  30 10         15614
31  30 11         16110
32  30 12         15822
33  30 13         15925
34  30 14         15798
35  30 15         15792
36  30 16         16030
37  30 17         15973
38  30 18         16081
39  30 19         15990
40  30 20         15848
41  30 21         15754
42  30 22         15868
43  30 23         15719>
In [9]:
# if we found it useful, we could also do other dataframe like operations on the data
mean = df.loc[:,"sum_mebibyte"].mean()
median = df.loc[:,"sum_mebibyte"].median()
minimum = df.loc[:,"sum_mebibyte"].min()
maximum = df.loc[:,"sum_mebibyte"].max()


print('mean: {}\nmedian: {}\nminimum: {}\nmaximum: {}'.format(mean, median, minimum, maximum))
mean: 15224.477272727272
median: 15623.0
minimum: 13067
maximum: 16260
In [10]:
# Also, let's see what that looks like in a plot!

%matplotlib inline

import pandas as pd
import matplotlib
df = pd.read_sql_query(sql, db, params=[(target)])

df.plot(x='hour', y='sum_mebibyte', kind='line')
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f923bb121d0>

Example 2:

Advent of Code 2019, Day 3:

In [11]:
# puzzle input:
input_1 = 'R1004,U518,R309,D991,R436,D360,L322,U627,R94,D636,L846,D385,R563,U220,L312,D605,L612,D843,R848,U193,L671,D852,L129,D680,L946,D261,L804,D482,R196,U960,L234,U577,R206,D973,R407,D400,R44,D103,R463,U907,L972,U628,L962,U856,L564,D25,L425,U332,R931,U837,R556,U435,R88,U860,L982,D393,R793,D86,R647,D337,R514,D361,L777,U640,R833,D674,L817,D260,R382,U168,R161,U449,L670,U814,L42,U461,R570,U855,L111,U734,L699,U602,R628,D79,L982,D494,L616,D484,R259,U429,L917,D321,R429,U854,R735,D373,L508,D59,L207,D192,L120,D943,R648,U245,L670,D571,L46,D195,L989,U589,L34,D177,L682,U468,L783,D143,L940,U412,R875,D604,R867,D951,L82,U851,L550,D21,L425,D81,L659,D231,R92,D232,R27,D269,L351,D369,R622,U737,R531,U693,R295,U217,R249,U994,R635,U267,L863,U690,L398,U576,R982,U252,L649,U321,L814,U516,R827,U74,L80,U624,L802,D620,L544,U249,R983,U424,R564,D217,R151,U8,L813,D311,R203,U478,R999,U495,R957,U641,R40,U431,L830,U67,L31,U532,R345,U878,L996,D223,L76,D264,R823,U27,L776,U936,L614,U421,L398,U168,L90,U525,R640,U95,L761,U938,R296,D463,L349,D709,R428,U818,L376,D444,L748,D527,L755,U750,R175,U495,R587,D767,L332,U665,L84,D747,L183,D969,R37,D514,R949,U985,R548,U939,L170,U415,R857,D480,R836,D363,R763,D997,R721,D140,R699,U673,L724,U375,R55,U758,R634,D590,L608,U674,R809,U308,L681,D957,R30,D913,L633,D939,L474,D567,R290,D615,L646,D478,L822,D471,L952,D937,R306,U380,R695,U788,R555,D64,R769,D785,R115,U474,R232,U353,R534,D268,L434,U790,L777,D223,L168,U21,L411,D524,R862,D43,L979,U65,R771,U872,L983,U765,R162'
input_2 = 'L998,U952,R204,U266,R353,U227,L209,D718,L28,D989,R535,U517,L934,D711,R878,U268,L895,D766,L423,U543,L636,D808,L176,U493,R22,D222,R956,U347,R953,U468,R657,D907,R464,U875,L162,U225,L410,U704,R76,D985,L711,U176,R496,D720,L395,U907,R223,D144,R292,D523,R514,D942,R838,U551,L487,D518,L159,D880,R53,D519,L173,D449,R525,U645,L65,D568,R327,U667,R790,U131,R402,U869,R287,D411,R576,D265,R639,D783,R629,U107,L571,D247,L61,D548,L916,D397,R715,U138,R399,D159,L523,U2,R794,U699,R854,U731,L234,D135,L98,U702,L179,D364,R123,D900,L548,U880,R560,D648,L701,D928,R256,D970,L396,U201,L47,U156,R723,D759,R663,D306,L436,U508,R371,D494,L147,U131,R946,D207,L516,U514,R992,D592,L356,D869,L299,U10,R744,D13,L52,U749,R400,D146,L193,U720,L226,U973,R971,U691,R657,D604,L984,U652,L378,D811,L325,D714,R131,D428,R418,U750,L706,D855,L947,U557,L985,D688,L615,D114,R202,D746,R987,U353,R268,U14,R709,U595,R982,U332,R84,D620,L75,D885,L269,D544,L137,U124,R361,U502,L290,D710,L108,D254,R278,U47,R74,U293,R237,U83,L80,U661,R550,U886,L201,D527,L351,U668,R366,D384,L937,D768,L906,D388,L604,U515,R632,D486,L404,D980,L652,U404,L224,U957,L197,D496,R690,U407,L448,U953,R391,U446,L964,U372,R351,D786,L187,D643,L911,D557,R254,D135,L150,U833,R876,U114,R688,D654,L991,U717,R649,U464,R551,U886,L780,U293,L656,U681,L532,U184,L903,D42,L417,D917,L8,U910,L600,D872,L632,D221,R980,U438,R183,D973,L321,D652,L540,D163,R796,U404,L507,D495,R707,U322,R16,U59,L421,D255,L463,U462,L524,D703,L702,D904,L597,D385,L374,U411,L702,U804,R706,D56,L288'
In [12]:
print(
    len(input_1.split(',')),
    len(input_2.split(','))
)
301 301
In [13]:
import matplotlib.pyplot as plt

line_one_path = input_1.split(",")
line_two_path = input_2.split(",")

class Wire:    
    def __init__(self):
        self.x_points = [0]
        self.y_points = [0]
    
    def PlotPoints(self, command: list, origin=False):
        y_plane = 0
        x_plane = 0
        for cm in command:
            direction = cm[0]
            distance = int(cm[1:])
            if direction == "R":
                x_plane += distance
            elif direction == "U":
                y_plane += distance
            elif direction == "L":
                x_plane += -distance
            elif direction == "D":
                y_plane += -distance
            else:
                raise Exception("Bad command given.")
            self.y_points.append(y_plane)
            self.x_points.append(x_plane)
        if origin:
            plt.plot(self.x_points, self.y_points, "bo")
        else:
            plt.plot(self.x_points, self.y_points)


wire = [Wire(), Wire()]
central_port = Wire()
wire[0].PlotPoints(line_one_path)  # converts list commands to plotted points
wire[1].PlotPoints(line_two_path)  # converts list commands to plotted points
central_port.PlotPoints([], True)

# creates graph window
plt.xlabel('x - axis')
plt.ylabel('y - axis')
plt.title('--- Crossed Wires ---')
plt.rcParams['figure.dpi'] = 300
plt.show()
In [14]:
import sqlite3
import re

# db = sqlite3.connect('data.db')
db = sqlite3.connect("file::memory:?cache=shared")

sql = """
BEGIN TRANSACTION;
DROP TABLE IF EXISTS `cords`;
CREATE TABLE IF NOT EXISTS `cords` (
	`id`	INTEGER PRIMARY KEY AUTOINCREMENT,
	`wire`	TEXT,
	`x_cord`	INTEGER,
	`y_cord`	INTEGER,
    `running_distance` INTEGER
);
DROP INDEX IF EXISTS `idx_cords`;
CREATE INDEX IF NOT EXISTS `idx_cords` ON `cords` (
	`wire`,
	`x_cord`,
	`y_cord`
);
DROP VIEW IF EXISTS `wire_one`;
DROP VIEW IF EXISTS `wire_two`;
CREATE VIEW `wire_one` AS select * from cords WHERE cords.wire = 'wire_one';
CREATE VIEW `wire_two` AS select * from cords WHERE cords.wire = 'wire_two';
COMMIT;
"""

db.executescript(sql)

def fill_wire_cords(input_directions, cord_name=None):
    """
    will expand the directions into sets of coordinates in the 
    database table `cords` -- keeping track of the running distance 
    of the wire (cord_name)
    """
    
    cursor = db.cursor()
    sql = """INSERT INTO cords (wire, x_cord, y_cord, running_distance) VALUES (?,?,?,?)"""
    current_x = 0
    current_y = 0
    running_distance = 0
    
    for value in input_directions.split(','):
        # right        
        if re.match('^R{1}', value):
            distance = int(re.split('[A-Z]{1}([0-9]{1,}$)', value)[1])
            for i in range(current_x+1, current_x+distance+1, 1):
                running_distance += 1
                cursor.execute(sql, (cord_name, i, current_y, running_distance))
            current_x += distance
        # left
        if re.match('^L{1}', value):
            distance = int(re.split('[A-Z]{1}([0-9]{1,}$)', value)[1])
            for i in range(current_x-1, current_x-distance-1, -1):
                running_distance += 1
                cursor.execute(sql, (cord_name, i, current_y, running_distance))
            current_x -= distance
        # up
        if re.match('^U{1}', value):
            distance = int(re.split('[A-Z]{1}([0-9]{1,}$)', value)[1])
            for i in range(current_y+1, current_y+distance+1, 1):
                running_distance += 1
                cursor.execute(sql, (cord_name, current_x, i, running_distance))
            current_y += distance
        # down
        if re.match('^D{1}', value):
            distance = int(re.split('[A-Z]{1}([0-9]{1,}$)', value)[1])
            for i in range(current_y-1, current_y-distance-1, -1):
                running_distance += 1
                cursor.execute(sql, (cord_name, current_x, i, running_distance))
            current_y -= distance
        
        db.commit()

    cursor.close()


# Use the function to insert the wire paths
fill_wire_cords(input_1, 'wire_one')
fill_wire_cords(input_2, 'wire_two')

cursor = db.cursor()

result = cursor.execute("""
SELECT
abs(wire_one.x_cord) + abs(wire_one.y_cord) as mh_dist,
wire_one.running_distance + wire_two.running_distance as total_dist

FROM
wire_one

JOIN
wire_two
ON
  wire_two.x_cord = wire_one.x_cord
  AND wire_two.y_cord = wire_one.y_cord

ORDER BY
total_dist,
mh_dist
""")

for mh_dist, total_dist in result:
    print('total_dist: {}\tmh_distance: {}'.format(total_dist, mh_dist))
total_dist: 20286	mh_distance: 561
total_dist: 21966	mh_distance: 227
total_dist: 24816	mh_distance: 1565
total_dist: 24816	mh_distance: 1715
total_dist: 36366	mh_distance: 2393
total_dist: 37040	mh_distance: 2224
total_dist: 37522	mh_distance: 2004
total_dist: 38518	mh_distance: 2360
total_dist: 39050	mh_distance: 2450
total_dist: 40276	mh_distance: 2065
total_dist: 40276	mh_distance: 3094
total_dist: 40276	mh_distance: 3517
total_dist: 40666	mh_distance: 861
total_dist: 40928	mh_distance: 888
total_dist: 41324	mh_distance: 3246
total_dist: 41890	mh_distance: 3346
total_dist: 42432	mh_distance: 3100
total_dist: 42770	mh_distance: 1260
total_dist: 42786	mh_distance: 1556
total_dist: 44218	mh_distance: 1082
total_dist: 44332	mh_distance: 1917
total_dist: 45072	mh_distance: 814
total_dist: 47346	mh_distance: 1951
total_dist: 48238	mh_distance: 2397
total_dist: 48952	mh_distance: 1594
total_dist: 49124	mh_distance: 1283
total_dist: 49124	mh_distance: 2174
total_dist: 50706	mh_distance: 2017
total_dist: 50990	mh_distance: 1981
total_dist: 52070	mh_distance: 1227
total_dist: 52070	mh_distance: 2208
total_dist: 52070	mh_distance: 2391
total_dist: 54188	mh_distance: 1501