Hacker News new | ask | show | jobs
by giwook 30 days ago
I think this is the key issue with running locally hosted models.

Yes, technically you can run them on 12gb vram.

But should you?

Realistically 64gb seems to be the current threshold for getting meaningful work done while also maintaining a large enough context window.

1 comments

This will drop further with increase in intelligence density.
It should, which is why I said it is the current threshold.