|
|
|
|
|
by ssl-3
1 day ago
|
|
There was a time when a person could walk through a few department stores every week (or even every day) just to take note of some prices along the way, and ultimately tabulate them to try to identify and snatch up the best deal once it happens. And if everyone did this, it'd be a real problem. The stores would be clogged up by geeks writing notes in little books with Parker Jotters and just basically wasting space and taking up air conditioning while they sleuth out the best way to put the screws to the company for a few measly dollars. That'd be awful. But not many people ever did that in stores, and not many individual people are doing that today with the web. It's really not a problem. (And if a website in 2026 can't stand the burn of several thousand personal scrapers that are operated by people who actually want to buy stuff from it, then maybe that system simply sucks and needs to be rethought.) |
|
This is how it started! I noticed certain things during my weekly shop that I did a double-take on and thought "wasn't that $cheaper last week!?". Took me ~ 45 min to figure out that the retailer actually has a really nice graphQL endpoint that powers the "view your previous receipts" function on their website. Of course they don't document this / make it available for 3rd parties... so scrape it is!
I wrote a bot to dump every receipt into a sqlite DB and I fire it up ~ weekly to pull down receipts that it doesn't have locally.
Turns out, not _everything_ has gotten more expensive @ my local grocery store over the past few years... just most things have :/.
> But not many people ever did that in stores,
There's a cottage-industry of firms out there that get gig-workers to pop in to $randomStore and take a picture of $randomItem on shelf w/ the price tag in the photo. The firms sell this info to stores that want to know how a competitor might be doing pricing / placing certain items on the more valuable shelf spots.
> and not many individual people are doing that today with the web. It's really not a problem.
That's my point! I scrape a few hundred pages per day across _many_ domains. My bots respect 429s and they have some other backoff/random-jitter strategies baked in to _not_ be the reason anti-scrape proliferates.