Hacker News new | ask | show | jobs
by JackYoustra 694 days ago
Probably if you have any esoteric flags that pytorch supports. Flash attention 2, for example, was supported way earlier on pt than llama.cpp, so if flash attention 3 follows the same path it'll probably make more sense to use this when targeting nvidia gpus.
1 comments

It would appear that Flash-3 is already something that exists for PyTorch based on this joint blog between Nvidia, Together.ai and Princeton about enabling Flash-3 for PyTorch: https://pytorch.org/blog/flashattention-3/
Right - my point about "follows the same path" mostly revolves around llama.cpp's latency in adopting it.