| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by JackYoustra 694 days ago
	Probably if you have any esoteric flags that pytorch supports. Flash attention 2, for example, was supported way earlier on pt than llama.cpp, so if flash attention 3 follows the same path it'll probably make more sense to use this when targeting nvidia gpus.

1 comments

sunshinesfbay 693 days ago

It would appear that Flash-3 is already something that exists for PyTorch based on this joint blog between Nvidia, Together.ai and Princeton about enabling Flash-3 for PyTorch: https://pytorch.org/blog/flashattention-3/

link

JackYoustra 693 days ago

Right - my point about "follows the same path" mostly revolves around llama.cpp's latency in adopting it.

link