|
|
|
|
|
by rini17
923 days ago
|
|
AVX might be going the right direction, even if the AVX512 was stretch too far. I was impressed by llama.cpp performance boost when AVX1 support was added. There's no intrinsic reason why multiplying matrices requires massive parallelism, in principle it could be done on few cores plus good management of ALUs/memory bandwidth/caches. |
|