Hacker News new | ask | show | jobs
by dylan522p 1223 days ago
Your math is completely wrong dude.

It's 2000ms per token not for the whole query.

Hardware utilization rates and MFU are not the same thing, you forgot the latter.

You are pretending its perfectly parelelized on 1 GPU too. I use 8x GPU box throughput.