AFAIK current models can run even with 64GB, but I would assume that we will very likely have bigger models very soon so I guess the answer is as much as you can afford
The next question is m1 or m2, and the impact of the various number of gpu units between pro, max, ultra skews. I'm really tempted to buy a "refurbished m1 studio" with 128gb because I think the ram is the key. Have not seen any benchmarks with diff # of gpus/aka diff skews.
it runs slightly slower on the GPU than under llama.cpp but uses much less power doing so
I would guess the slowness is due to immaturity of the PyTorch MPS backend, the asitop graphs show it doing a bunch of cpu along with the gpu, so it might be inefficiently falling back to cpu for some ops and swapping layers back and forth (I have no idea, just guessing)