Hacker News new | ask | show | jobs
by aasasd 2179 days ago
Gaps might be due to the fact that scraping requires fiddling with the code every time the site changes even invisibly—or data just stops coming in.
1 comments

Can't answer conclusively. The big dip and the fall-off at the end were probably errors in my scraping. The site went down for a bit at a couple points, and the way dates were formatted changed a bit, all of which I thought I handled correctly but maybe not. And at a certain point of combing back over gaps, I just decided to be done.

I strongly get the impression that sales volumes did increase from 2014 onward, but sales in 2014, particularly in the early range there, probably appear lower than they are. IDs in that range sometimes returned normal Goodwill item pages, and sometimes returned 404-type pages. Maybe they migrated systems or something around that time?