;; add scraper metadata (defmacro defscraper [name & decls] (list* 'defn- (with-meta name (assoc (meta name) :scraper true)) name decls)) ;; compile a list of defined scrapers (defn- *collect-scrapers* [] (filter (fn [func] (get (meta (val func)) :scraper false)) (ns-interns 'com.wombat.web.scrapers))) ;; run all defined scrapers (defn *run-all-scrapers* [] (let [scrapers (*collect-scrapers*) threads (doall (for [[name scraper] scrapers] (future (store-site (scraper)))))] (doseq [t threads] (deref t))))Then I could just use defscraper instead of defn and voila, any function defined using defscraper would be run in parallel by (*run-all-scrapers*).
But after a while, several other issues came up. The scrapers file was getting long. I needed to define other function to work with scrapers, like individual functions that would store data from a scraper into a database or return information about the web site etc. So, I split the scrapers file and put each scraper into its own file and its own namespace. At first, I wanted to just refer all the scraper namespaces into the main scrapers namespace, but then I had an idea. What if instead of polluting the main namespace with all the scraper functions, I could keep them in their individual namespaces and find them by a standard name. So, I deleted the defscraper macro, changed all scraper function definitions to defn and called them all scraper. Then I changed the *collect-scrapers* and *run-all-scrapers* to look like this.
;; compile a list of defined scrapers (defn- *collect-scrapers* [] (map #(get (ns-publics %1) 'scraper) (filter #(contains? (ns-publics %1) 'scraper) (all-ns)))) ;; run all defined scrapers (defn *run-all-scrapers* [] (let [scrapers (*collect-scrapers*) threads (doall (for [scraper scrapers] (future (store-site (scraper)))))] (doseq [t threads] (deref t))))And that is that.
Hi there to everyone, the contents present at this web page are actually amazing for people knowledge, well, you can also visit Facebook Scraper for more Worth Web scraping services related information and knowledge. Keep up the good work.
ReplyDelete