Hacker News new | ask | show | jobs
by SlavikCA 504 days ago
2x CPU system may be slower for LLM than 1x CPU system.

Because in 2x CPU system, the model may have to be passed via NUMA, which has 10% - 30% of memory bandwidth bandwidth

1 comments

It would be interesting to see the performance if someone built a single socket version. I have a parts list here if anyone wants to try it:

https://news.ycombinator.com/item?id=42868360