Hacker News new | ask | show | jobs
by illiac786 24 days ago
I believe AI-for-everything will become unsustainable financially for many and I’m genuinely curious to see how people deal with it. When to use it? When is it wasteful?

My big hypothesis is that tokens are going to get much more expensive. Either that or OpenAI/anthropic are going bankrupt. I’m almost excited to find out, I have to admit.

Your remark just reminded me of this, I went a bit off topic, I admit.

1 comments

Have you tried DeepSeek V4 Flash? It's very competent and extremely cheap.

I think Gemma 4 is also a good example of a capable small model.

I mention these not only because they're cheap but because they can run on consumer devices. The "every year bigger and more capable SOTA model" trend is mirrored by "the every year smaller and more capable open source model" trend.

256GB is what deepseek v4 flash with Q4 requires I believe. It is really still very far from “running locally on your device”. And it’s getting further away every day, looking at how the electronic market prices are surging.

I need to find stats on average RAM of personal devices, but I expect it will be so low, we are light years away from running a frontier model (from today) locally on a smartphone, let’s stop dreaming (and I really would love having it).

I do agree local models are progressing and I am to this day in awe at what a 50GB file can do – it still feels like black magic to me.

Also granted, something like Gemma 2 2B seems to have similar performance to ChatGP 3.5 and only require 2GB of RAM. But I think the RAM/performance ratio curve over time is logarithmic and not linear, it’s moving slower and slower.