Hacker News new | ask | show | jobs
by gravypod 18 days ago
It's, from my understanding, a little bit of both. There's a failure rate of GPUs and fans. There's also changing in standards like PCIe and software stacks.

LLM inference is mainly memory bandwidth constrained so I think it's highly likely that a company will create silicon with just an insane number of memory chips and less compute. These ASICs will probably do the same thing the crypto ASICs did.

If we look back 1 decade, no one uses a GTX 950 for anything.

1 comments

You'd be surprised, people are somehow buying Tesla P40s and M40s on eBay for almost $300 and $180 respectively (M40 being the same gen as GTX 950). Google Colab still offers T4s and it's taken them years to add modern GPUs. Hope they're powering them with renewables at least.

And people in general are holding on to their old machines for very long periods of time now, especially CPUs. I've had to support first gen Intel i7s at work! That's pre AVX.

Just a note, P40 came out at $5700 in 2016 dollars. In 2026 dollars that is $8000 (wow!). If you bought 100k today, assuming a 1% failure rate per year your $800M investment can be traded in for about $30M.

I think it is reasonable to assume a similar depreciation in GPUs.

Meaning you'd need to have made more than (800M - 30M) * (1 + income tax rate) + (power + maintenance).

Some say the margines on inference are already there for new GPUs but they are right margines.