Hacker News new | ask | show | jobs
by ls612 33 days ago
10 tok/s is around the borderline of interactive being good. I did the math and it is mostly bottlenecked by memory bandwidth, so in the future I can expect to run a similarly sized model on my 4090 once it gets retired from gaming service and get ~25 tok/s which will be very usable.