Hacker News new | ask | show | jobs
by downvotetruth 1202 days ago
Follow up: https://github.com/facebookresearch/llama/issues/79#issuecom... claims 65B was able to fit in 128 GB by unsharding & merging weights into a single file instead of the multiple pth with 172Gb max swap file usage & appears to stream to GPU.