Hacker News new | ask | show | jobs
by amluto 218 days ago
It would be amusing to run this on a regular desktop computer or even a moderately nice laptop (with a fan - give it a chance!) and see how it does. 650GB will stream in quite quickly from any decent NVMe device, and those 8-16 cores might well be considerably faster than whatever cores the cloud machines are giving you.

S3 is an amazingly engineered product, operates at truly impressive scale, is quite reasonably priced if you think of it as warm-to-very-cold storage with excellent durability properties, and has performance that barely holds a candle to any decent modern local storage device.

3 comments

Absolutely. I recently reworked a bunch of tests and found my desktop to outcompete our (larger, custom) Github Action runner by roughly 5x. And I expect this delta to increase a lot as you lean on the local I/O harder.

It really is shocking how much you're paying given how little you get. I certainly don't want to run a data center and handle all the scaling and complexity of such an endeavour. But wow, the tax you pay to have someone manage all that is staggering.

Everyone wants a data lake when what they have a is a data pond.
I think you meant puddle.

cue Peppa Pig laughter sounds

Totally true. I have a trusty old (like 2016 era) X99 setup that I use for 1.2TB of time series data hosted in a timescaledb PostGIS database. I can fetch all the data I need quickly to crunch on another local machine, and max out my aging network gear to experiment with different model training scenarios. It cost me ~$500 to build the machine, and it stays off when I'm not using it.

Much easier obviously dealing with a dataset that doesn't change, but doing the same in the cloud would just be throwing money away.