Hacker News new | ask | show | jobs
by DrPhish 498 days ago
“you can get a dual EPYC server with 768GB RAM - CPU inference only at around 6-8 tokens/sec.”

This is what I run at home. I built it just over a year ago and have run every single model that has been released.