Hacker News new | ask | show | jobs
by M4v3R 917 days ago
Because there’s always some overhead during inference plus you don’t want to fill all your available RAM because you risk swapping to disk which will make everything slow to a crawl.
1 comments

so why is the overhead a 1/3 ratio instead of a constant amount? just testing the scaling assumption