Hacker News new | ask | show | jobs
by calaphos 414 days ago
There's still a throughput/latency tradeoff curve, at least for any sort of interactive models.

One of the reasons why inference providers sell batch discounts.