| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by diggan 321 days ago
	> And flash attention doesn't work on 5090 yet, right? Flash attention works with GPT-OSS + llama.cpp (tested on 1d72c8418) and other Blackwell card (RTX Pro 6000) so I think it should work on 5090 as well, it's the same architecture after all.