|
|
|
|
|
by Ambix
1143 days ago
|
|
Yeah, it's really so bad on desktops. With my LLaMA AVX implementation on 32bit floats [0] there no performance gain after 2 threads, so remaining 14 threads available are of no use, there no memory bandwidth to load them with work :) [0] https://github.com/gotzmann/llama.go |
|