Y
Hacker News
new
|
ask
|
show
|
jobs
by
GaggiX
50 days ago
At 4-bit quantization it should already fit quite nicely.
1 comments
Aurornis
50 days ago
Unfortunately not with a reasonable context length.
link
regularfry
50 days ago
I've got 139k context with the UD-Q4_K_XL on a 4090, q8_0 ctk/v. Could probably squeeze a little more but that's enough for me for the moment.
link
corysama
50 days ago
Hey, buddy! Can I bum a command line arg list off ya?
link
GaggiX
50 days ago
The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.
link
kkzz99
50 days ago
It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090.
link