Y
Hacker News
new
|
ask
|
show
|
jobs
by
atwrk
107 days ago
Local LLM inference is all about memory bandwidth, and an M4 pro only has about the same as a Strix Halo or DGX Spark. That's why the older ultras are popular with the local LLM crowd.