|
|
|
|
|
by JackYoustra
694 days ago
|
|
Probably if you have any esoteric flags that pytorch supports. Flash attention 2, for example, was supported way earlier on pt than llama.cpp, so if flash attention 3 follows the same path it'll probably make more sense to use this when targeting nvidia gpus. |
|