I've been writing a lot of web scraping code in CL, and I've settled on a simple policy for organizing this scraping code.
https://codeberg.org/ggxx/scraper
I have a generic function called process and it dispatches on subclasses of abstract-page. Page instances also specify where the page is located, and those are polymorphically loaded too.
;; scrape all images from document fetched via HTTP
(defparameter mi (page 'all-images "https://metacpan.org/"))
(process mi)
;; scrape all images from document read from filesystem
(process (page 'all-images "file:///tmp/index.html"))
;; it can also use dom trees
(defparameter dom (load-resource-as-dom (uri "file:///tmp/index.html")))
(process (page 'all-images dom))
Comment too long. Click here to view the full text.