Y
Hacker News
new
|
ask
|
show
|
jobs
by
elorant
742 days ago
Is this at 4-bit quantization? And how many tokens per second is the output?
1 comments
hehdhdjehehegwv
742 days ago
I’m doing non-interactive tasks, but in terms of the A6000 running llama3 70b in chat mode it’s as usable as any of the commercial offerings in terms of speed. I read quickly and it’s faster than I read.
link