Hacker News new | ask | show | jobs
by elorant 742 days ago
Is this at 4-bit quantization? And how many tokens per second is the output?
1 comments

I’m doing non-interactive tasks, but in terms of the A6000 running llama3 70b in chat mode it’s as usable as any of the commercial offerings in terms of speed. I read quickly and it’s faster than I read.