Hacker News new | ask | show | jobs
by cjonas 637 days ago
Did you mean "retrieval is comparatively inexpensive"? I think I'm on the same page but this threw me off.
1 comments

I read it as retrieval being the requests to the scraped site. I can parse a few thousand HTML pages in minutes, but fetching them in the first place takes hours.
Exactly what I intended. Scraping is slow (and may be an irreplaceable snapshot in time). Parsing is fast and repeatable so should be done in a separate process from a stored copy.