Y
Hacker News
new
|
ask
|
show
|
jobs
by
haellsigh
53 days ago
Fyi, I believe `--flash-attn on` doesn't do anything, you should instead use `--flash-attn 1`. I'm getting ~150t/s on a RTX 3080 10GB as well with f16 cache type.
1 comments
freakynit
53 days ago
Thanks.. updated my local docs :)
link