Hacker News new | ask | show | jobs
by swyx 909 days ago
whats the intuition for 2/3 of RAM?
2 comments

Because there’s always some overhead during inference plus you don’t want to fill all your available RAM because you risk swapping to disk which will make everything slow to a crawl.
so why is the overhead a 1/3 ratio instead of a constant amount? just testing the scaling assumption
you need some leftover for holding the context