Y
Hacker News
new
|
ask
|
show
|
jobs
by
regularfry
110 days ago
I've got the unsloth q4_K_XL 35b running in llama.cpp on an i9/64G/4090 machine doing double-digit tokens per second with a 90k+ token context window available. The model's completely in VRAM.