Llama.cpp is a pretty extreme cpu ram bus saturator, but I dunno how close it is (and its kind of irrelevant because why wouldn't you use a Metal backend).
If you want to run larger models, then CPU inference is your only choice.
Also, not many implementations can even use it.
If you want to run larger models, then CPU inference is your only choice.