Hacker News new | ask | show | jobs
by dcchambers 594 days ago
A top-level M4 Max w/ 128GB of unified memory is a beast for local LLM inference. It means we could see an M4 Ultra with 256GB of memory!

I think theoretically you could run inference on Llama 3.1 405B (4 bit) on a Mac Studio which is kinda nuts.