Practical Guide for Library Sys Admins · From the IUG 2026 Sys Admin Forum · Wednesday, April 15
A comprehensive guide to putting Sierra’s web OPAC behind Cloudflare: what works, what breaks, and what to watch out for. Compiled from the IUG 2026 Sys Admin Forum discussion and follow-up research.
AI bot scraping became a serious problem for libraries starting in late 2024. The scale is unprecedented.
Bots don’t identify themselves as GPTBot/GoogleBot/BingBot, they ignore robots.txt, they use residential proxies, and they crawl at rates that overwhelm library infrastructure not designed for that load.
Bottom line: Any library running a public-facing OPAC or catalog is a target. Sierra WebPAC is no exception.
Cloudflare acts as a reverse proxy. All HTTP/HTTPS traffic to your OPAC domain flows through Cloudflare’s network before reaching your Sierra server. This gives Cloudflare the ability to:
Cloudflare’s standard proxy only handles HTTP/HTTPS traffic on specific ports. It does not protect:
Cloudflare-supported HTTP ports: 80, 8080, 8880, 2052, 2082, 2086, 2095
Cloudflare-supported HTTPS ports: 443, 2053, 2083, 2087, 2096, 8443
Note: Sierra’s WebPAC staging server runs on port 2082, which happens to be a Cloudflare-supported HTTP port. The live WebPAC typically runs on port 80/443.
For non-HTTP protocols on arbitrary ports, Cloudflare Spectrum (Enterprise only) can proxy TCP/UDP traffic. This is the only way to protect Z39.50 or SIP2 through Cloudflare — and it requires an Enterprise plan.
| Deployment | Cloudflare Setup |
|---|---|
| Self-hosted / on-premise | Full control — point DNS to Cloudflare, configure as needed |
| III cloud-hosted | May need to coordinate with III — you may not control DNS or the web server directly |
| Vega Discover (SaaS) | Likely already behind III’s own CDN/WAF — limited customization |
Critical: Only proxy the web OPAC record. Leave other records as DNS-only (gray cloud) for:
Set encryption mode to Full (Strict). This encrypts traffic both:
Recommendation: Use a Cloudflare Origin CA cert if you’re committed to keeping all traffic through Cloudflare. Use Let’s Encrypt if you want flexibility to bypass Cloudflare temporarily for troubleshooting.
Never use “Flexible” SSL mode — this leaves the Cloudflare-to-origin connection unencrypted, which is a security risk, especially for patron login traffic.
When Sierra sits behind Cloudflare, all requests appear to come from Cloudflare’s IP addresses. You need to restore the real visitor IP for:
Cloudflare sends the real IP in the CF-Connecting-IP header and also appends to X-Forwarded-For. Configure your web server (Apache/Nginx in front of Sierra) to trust these headers from Cloudflare’s IP ranges.
| Path Pattern | Function | Sensitivity |
|---|---|---|
/ |
Main menu, resets session | Public |
/search/... |
Catalog search (by index) | Public, high traffic |
/patroninfo/... |
Patron account (My Account) | Authenticated — protect |
/record/... |
Individual bib/item records | Public |
/xrecord/... |
XML record export | Public but abusable |
/iii/sierra-api/... |
REST API (v5/v6) | Authenticated — protect |
/screens/... |
WebPAC template files | Static assets |
Action: Managed Challenge
(http.request.uri.path contains "/patroninfo" and not cf.bot_management.verified_bot)
Action: Block
(http.request.uri.path contains "/iii/sierra-api" and not ip.src in {YOUR_TRUSTED_IPS})
Action: Block
(http.request.uri.path contains "/search" and (http.user_agent contains "python" or http.user_agent contains "curl" or http.user_agent contains "wget" or http.user_agent contains "scrapy") and not cf.bot_management.verified_bot)
Action: Managed Challenge
(http.request.uri.path contains "/xrecord" and not ip.src in {OCLC_IPS DISCOVERY_IPS})
| Feature | Free | Pro (~$20–25/mo) | Business (~$200–250/mo) | Enterprise |
|---|---|---|---|---|
| Bot Fight Mode | Basic | — | — | — |
| Super Bot Fight Mode | — | Yes | Yes | — |
| Bot Management (full) | — | — | — | Yes |
| Verified bot allowlist | — | Yes | Yes | Yes |
| Bot score analytics | — | Yes | Yes | Yes |
| AI Scrapers one-click block | Yes | Yes | Yes | Yes |
Navigate to Security → Bots and enable “AI Scrapers and Crawlers” toggle. This blocks known AI crawlers (GPTBot, CCBot, etc.) and is updated by Cloudflare as new bot signatures are identified. Available on all plans including free.
As of July 2025, Cloudflare blocks AI crawlers by default for new zones.
Cloudflare maintains a verified bots directory of known good bots (Googlebot, Bingbot, etc.) verified via reverse DNS. The concern for libraries is that library-specific bots are generally NOT on this list.
| Service | Bot Behavior | On Cloudflare Verified List? | Mitigation |
|---|---|---|---|
| Googlebot | Crawls OPAC for search indexing | Yes | Auto-allowed |
| Bingbot | Same | Yes | Auto-allowed |
| OCLC WorldCat harvesting | Harvests MARC records | Unlikely | Allowlist by IP |
| EBSCO EDS connector | Queries OPAC for discovery | No | Allowlist by IP |
| Ex Libris Primo/Summon | Queries OPAC for discovery | No | Allowlist by IP |
| EZproxy | Proxies patron requests | No | Allowlist by IP |
| Link resolvers (SFX, 360 Link) | Checks availability | No | Allowlist by IP |
| Google Scholar | Crawls for academic citations | Check verified list | Usually verified |
OCLC explicitly documents this: “You can use Cloudflare with EZproxy. Make sure you list your on-campus IP addresses, EZproxy Server IP address, and EZproxy name with Cloudflare.” If you don’t allowlist your EZproxy server IP, Cloudflare will challenge EZproxy traffic and potentially block patron access to the catalog from off-campus.
Create a WAF rule:
(ip.src in {EZPROXY_IP ON_CAMPUS_RANGES OCLC_IPS DISCOVERY_LAYER_IPS})
Action: Skip (all remaining rules)
Place this rule first in your rule order so trusted traffic bypasses all challenges.
Sierra serves a mix of public catalog pages and authenticated patron content. The caching strategy must be careful.
| Content Type | Cache? | Notes |
|---|---|---|
| Static assets (CSS, JS, images) | Yes | Long TTL (1 day+) |
/screens/... template files |
Yes | WebPAC templates |
Catalog search results /search/... |
Maybe | Short TTL (5 min) if desired, but dynamic content — test carefully |
Individual bib records /record/... |
Maybe | Short TTL, but patron-specific elements may appear |
/patroninfo/... |
NEVER | Authenticated patron data |
/iii/sierra-api/... |
NEVER | API responses with patron PII |
| MARC downloads | No | Dynamic, binary content |
Match: URI path contains /patroninfo OR URI path contains /sierra-api
Setting: Bypass Cache
Match: Cookie contains III_SESSION (or your Sierra session cookie name)
Setting: Bypass Cache
Note: “Bypass Cache on Cookie” requires a Business plan or a Cloudflare Worker on lower plans.
Match: URI path contains /screens/ OR file extension in {css js png jpg gif ico svg woff woff2}
Setting: Cache Everything, Edge TTL 1 day, Browser TTL 4 hours
By default, Cloudflare only caches static file extensions (images, CSS, JS, fonts). It does not cache HTML pages unless you explicitly tell it to. This is actually a safe default for Sierra — it means patron pages won’t be accidentally cached.
Rate limiting rules are available on all plans (IP-based). Advanced grouping by cookie/header/ASN requires Business+. Here are sensible defaults for a library OPAC.
(http.request.uri.path contains "/patroninfo" and http.request.method eq "POST")
Characteristics: IP · Period: 1 minute · Requests: 5 · Action: Managed Challenge · Duration: 15 minutes
Mirrors Cloudflare’s built-in “Protect My Login” pattern: 5 attempts per minute, then challenge for 15 minutes.
(http.request.uri.path contains "/search")
Characteristics: IP · Period: 1 minute · Requests: 30 · Action: Managed Challenge · Duration: 10 minutes
A human doing catalog searches will rarely exceed 30 per minute. A scraper will hit this quickly.
(http.request.uri.path contains "/iii/sierra-api")
Characteristics: IP · Period: 1 minute · Requests: 60 · Action: Block · Duration: 10 minutes
(http.request.uri.path ne "/")
Characteristics: IP · Period: 10 seconds · Requests: 50 · Action: Managed Challenge · Duration: 10 minutes
Any single IP making 50+ requests in 10 seconds is almost certainly not a human.
Important: Rate limiting counters may have a delay of a few seconds. Don’t rely on rate limiting for precise request counts — it’s a backstop, not a metering system.
Cloudflare is migrating from legacy Page Rules to the newer Rules products (Cache Rules, Configuration Rules, Transform Rules, Origin Rules, Redirect Rules). Use the new system if available.
Match: URI path contains /patroninfo
Setting: Security Level = High
Sets a higher threshold for challenges on authenticated pages.
Match: scheme eq "http"
Action: Redirect to HTTPS (301)
All OPAC traffic should be HTTPS, especially patron login.
Present a library-branded error page instead of Cloudflare’s generic challenge page. This reduces patron confusion when they encounter a bot challenge.
Match: URI path contains /iii/sierra-api
Settings: Disable Performance, Disable Apps, Disable Minification
API responses should not be modified by Cloudflare’s optimization features.
Sierra WebPAC uses cookies for session management, falling back to IP-based sessions if cookies aren’t available. Behind Cloudflare:
__cflb for load balancing, __cf_bm for bot management, cf_clearance for challenge bypass). These should not conflict with Sierra’s session cookies but increase cookie header size.Cloudflare’s proxy cannot handle Z39.50. It’s not HTTP. Options:
Same situation as Z39.50 — SIP2 is a raw TCP protocol. Self-checkout machines, automated materials handling, and other SIP2 clients must connect to a DNS-only record or directly to the server IP.
The Sierra API (v5/v6) runs over HTTPS, so it can go through Cloudflare. However:
X-Forwarded-For issues — if your API implementation uses client IP for anything, ensure you’re reading CF-Connecting-IPMARC downloads from the OPAC (.mrc binary files) should work through Cloudflare, but:
Cloudflare’s Managed Challenges and JS Challenges require a browser environment to solve. Any service that accesses your OPAC without a full browser will fail:
You must allowlist these services by IP before enabling aggressive bot protection.
Sierra’s staging WebPAC runs on port 2082. This is a Cloudflare-supported HTTP port, so it could be proxied. However, you probably want to keep staging access restricted — either leave it DNS-only or add a WAF rule blocking external access to port 2082.
If you’re running Encore or Vega Discover in addition to WebPAC:
| Feature | Free | Pro (~$20–25/mo) | Business (~$200–250/mo) | Enterprise |
|---|---|---|---|---|
| DDoS protection | Unmetered | Unmetered | Unmetered | Unmetered |
| SSL/TLS (Universal) | Yes | Yes | Yes | Yes |
| AI Scraper blocking (1-click) | Yes | Yes | Yes | Yes |
| Bot Fight Mode | Basic | Super Bot Fight Mode | Super Bot Fight Mode | Full Bot Management |
| WAF custom rules | 5 | 20 | 100 | 1000 |
| WAF managed rules (free ruleset) | Yes | Yes | Yes | Yes |
| OWASP Core Ruleset | No | Yes | Yes | Yes |
| Rate limiting (IP-based) | Yes | Yes | Yes | Yes |
| Rate limiting (advanced grouping) | No | No | Yes | Yes |
| Bypass cache on cookie | No | No | Yes | Yes |
| Custom error pages | No | No | Yes | Yes |
| Spectrum (non-HTTP proxy) | No | No | No | Yes |
| Bot score analytics | No | Yes | Yes | Yes |
Free tier gives you DDoS protection, basic bot fighting, AI scraper blocking, 5 WAF rules, and rate limiting. This is already a massive improvement over no protection.
Pro (~$20–25/mo) adds OWASP rules, 20 WAF rules, Super Bot Fight Mode with verified bot allowlisting, and bot analytics. Best value for most Sierra installations.
Business (~$200–250/mo) adds bypass-cache-on-cookie (important for patron sessions), 100 WAF rules, advanced rate limiting, and custom error pages.
Enterprise if you need Spectrum for Z39.50/SIP2 protection or full Bot Management with bot score granularity.
Cloudflare’s Project Galileo provides Business and Enterprise-tier features for free to qualifying organizations facing cyber threats. Participants get Bot Management, AI Crawl Control, and Zero Trust security products at no cost. It’s designed for journalism, human rights, and civil society groups. Public libraries may qualify depending on circumstances — worth applying if your library has been targeted by attacks.
This was discussed at the IUG 2026 Sys Admin Forum. Jeff reported his library uses F5 with fail2ban and has “had good luck.” Here’s how the approaches compare.
| Aspect | Cloudflare | F5 + fail2ban |
|---|---|---|
| Cost | Free tier available; Pro ~$20–25/mo | F5 hardware: $10K–$100K+; fail2ban: free |
| Setup complexity | DNS change + dashboard config | Network appliance + Linux server + custom filters |
| DDoS protection | Absorbs at edge (Cloudflare network) | Limited to your bandwidth/hardware |
| Bot intelligence | Global threat data, ML models, verified bot list | Pattern matching on your logs only |
| AI scraper blocking | One-click, continuously updated signatures | Manual rules, you maintain signatures |
| Rate limiting | Built-in, configurable per path | Custom fail2ban jails per log pattern |
| WAF rules | Managed rulesets + custom rules | F5 ASM (separate license) or manual |
| Handles distributed bots | Yes (global anycast network) | Poorly (each IP seen briefly, jail never triggers) |
| Non-HTTP protocols | No (unless Enterprise Spectrum) | Yes (F5 handles any TCP/UDP) |
| Latency | Adds ~1–5ms (edge PoP nearby) | Depends on network topology |
| Maintenance | Cloudflare updates rules/signatures | You maintain fail2ban filters and F5 configs |
| fail2ban + Cloudflare | Can combine: fail2ban triggers Cloudflare API to block IPs at edge | N/A |
The AI bot problem is distributed — bots use thousands of residential proxy IPs, each making only a few requests. fail2ban’s strength is banning IPs that show repeated bad behavior, but if each IP only makes 5 requests before rotating, the jail threshold is never reached.
Cloudflare’s ML-based bot detection looks at behavioral signals beyond IP: TLS fingerprint, HTTP/2 settings, mouse movement patterns, JavaScript execution behavior. This catches distributed bots that fail2ban misses.
Yes. fail2ban can call the Cloudflare API to push blocks to the Cloudflare edge. This gives you defense-in-depth:
The Cloudflare API for fail2ban is documented but has had compatibility issues (check the Cloudflare Community thread for current status).
Anubis is an open-source reverse proxy that presents a proof-of-work JavaScript challenge before allowing access. It’s being adopted by libraries as a Cloudflare alternative or complement.
Anubis sits in front of your web server. First-time visitors get a small JS challenge (SHA-256 proof of work, ~2 seconds in a browser). Bots running with minimal compute resources can’t solve it economically at scale.
Anubis is a good complement to Cloudflare for specific high-traffic paths, or a standalone option when Cloudflare isn’t feasible. It’s not a full replacement for Cloudflare’s broader security suite.
/patroninfo (login) and /search pathsConsider adding Cloudflare in front of the F5 (Cloudflare → F5 → Sierra):
Then iterate on WAF rules, rate limiting, and caching as time allows.
CF-Connecting-IP header restoration