Hacker News new | ask | show | jobs
by SEGyges 603 days ago
it is not necessarily 16x if you, e.g., decrease model width by a factor of 4 or so also, but yeah naively the RAM and FLOPs scale up by n^2