| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rini17 923 days ago
	AVX might be going the right direction, even if the AVX512 was stretch too far. I was impressed by llama.cpp performance boost when AVX1 support was added. There's no intrinsic reason why multiplying matrices requires massive parallelism, in principle it could be done on few cores plus good management of ALUs/memory bandwidth/caches.