Hacker News new | ask | show | jobs
by lmeyerov 2162 days ago
Sounds familiar =\

- get devs on GPU laptops

- for always-on, where doable, switch to an 8a - 6p policy, and reserved. Call aws for a discount.

- use g4dn x spot. Check per workload tho, it assumes single vs double.

- consider if can switch to fully on-demand if not already , and hybrid via GCP's attachable GPUs

- make $ more visible to devs. Often individuals just don't get it, too easy to be sloppy.

More probably doable, but increasingly situation dependent

1 comments

ALSO: For all the discussion of on-prem, for ML in particular, consider running training on a dedicated local hw box and run only inference on the cloud (which can be CPU)
I’ve been mulling this idea over in my head recently of investing a $2-3k in building a machine to do exactly that (and use it as a normal dev day to day machine when it’s not training), because it appears the economics of it are surprisingly great.

Have you (or anything else here) had experience doing this? Did it end up being a worthwhile approach? (Even for a while)

It depends how long it is on.

If training only short while, may do better by setting up a cloud training workflow that only has the server on while training. If on a lot, then a private box makes more sense (ex: lambdalabs, at home/office/colo). Then setup as a shared box for the team.

A lot of time ends up dev, not actual training, and folks end up keeping dev cloud GPUs on accidentally. We still use cloud GPUs for this, but have primary dev on local GPU laptops. For that, we started by System76 for everyone (ubuntu Nvidia), but those had major issues (weight, battery draw...). I then did a lightweight asus zenbook for myself, but that was too lightweight all around. Next time will do more inbetween or explore Thinkpad options.

And yep, as a small team, this mix dropped our cloud opex spend by like 90%, and pretty fast to offset the capex bump.