Y
Hacker News
new
|
ask
|
show
|
jobs
by
swyx
909 days ago
whats the intuition for 2/3 of RAM?
2 comments
M4v3R
909 days ago
Because there’s always some overhead during inference plus you don’t want to fill all your available RAM because you risk swapping to disk which will make everything slow to a crawl.
link
swyx
908 days ago
so why is the overhead a 1/3 ratio instead of a constant amount? just testing the scaling assumption
link
avereveard
909 days ago
you need some leftover for holding the context
link