Hacker News new | ask | show | jobs
by monobot12 499 days ago
If you don't mind a speed of 1 token per second, you can run the largest R1 model on a 2021 iMac, as I just did.
2 comments

Largest R1, as in the 671B? How do you accomplish that feat?
Just do it? Llama.cpp doesn't load the entire thing into ram. It mmaps the file and the kernel takes care of the rest.
Are we speaking of a 2020-edition Intel 27" iMac or a 2021 M1?