Hacker News new | ask | show | jobs
by akie 2 hours ago
Try doing it at scale for a whole office. Not trivial.
2 comments

There are plenty of US based hosters racing to optimize and drive efficiencies

Literal race on twitter posting to increase token throughput and drive down costs on these Chinese open source models

You could probably do with couple of instances. People rarely use ai 24/7, so right now you can oversubscribe and still have acceptable latency and high utilization rate.