Hacker News new | ask | show | jobs
by lumost 381 days ago
We don’t know what the marginal cost of inference is yet however. So far, users are demonstrating that they are willing to pay more for LLMs than traditional web experiences.

At the same time, cards have gotten >8x more efficient over the last 3 years, inference engines >10x more efficient and the raw models are at least treading water if not becoming more efficient. It’s likely that we’ll lose another 10-100x off the cost of inference in the next 2 years.