| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by baby_souffle 18 hours ago

> There was a time when a person could walk through a few department stores every week (or even every day) just to take note of some prices along the way, and ultimately tabulate them to try to identify and snatch up the best deal once it happens.

This is how it started! I noticed certain things during my weekly shop that I did a double-take on and thought "wasn't that $cheaper last week!?". Took me ~ 45 min to figure out that the retailer actually has a really nice graphQL endpoint that powers the "view your previous receipts" function on their website. Of course they don't document this / make it available for 3rd parties... so scrape it is!

I wrote a bot to dump every receipt into a sqlite DB and I fire it up ~ weekly to pull down receipts that it doesn't have locally.

Turns out, not _everything_ has gotten more expensive @ my local grocery store over the past few years... just most things have :/.

> But not many people ever did that in stores,

There's a cottage-industry of firms out there that get gig-workers to pop in to $randomStore and take a picture of $randomItem on shelf w/ the price tag in the photo. The firms sell this info to stores that want to know how a competitor might be doing pricing / placing certain items on the more valuable shelf spots.

> and not many individual people are doing that today with the web. It's really not a problem.

That's my point! I scrape a few hundred pages per day across _many_ domains. My bots respect 429s and they have some other backoff/random-jitter strategies baked in to _not_ be the reason anti-scrape proliferates.

1 comments

ssl-3 14 hours ago

That's awesome!

Please accept all possible encouragement. This is exactly the kind of personal project that a world wide web of network-connected computers is supposed to enable.

(I have no idea how it is that so many of us here have come to lose the plot.)

link