Hacker News new | ask | show | jobs
by coltonv 313 days ago
Yes but if I set it above ~16K on my 32gb laptop it just OOMs. Am I doing something wrong?
1 comments

try enable flash attention and offload all layer to GPU