Hacker News new | ask | show | jobs
by haellsigh 53 days ago
Fyi, I believe `--flash-attn on` doesn't do anything, you should instead use `--flash-attn 1`. I'm getting ~150t/s on a RTX 3080 10GB as well with f16 cache type.
1 comments

Thanks.. updated my local docs :)