| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by akie 2 hours ago
	Try doing it at scale for a whole office. Not trivial.

2 comments

arjunchint 1 hour ago

There are plenty of US based hosters racing to optimize and drive efficiencies

Literal race on twitter posting to increase token throughput and drive down costs on these Chinese open source models

link

ReptileMan 1 hour ago

You could probably do with couple of instances. People rarely use ai 24/7, so right now you can oversubscribe and still have acceptable latency and high utilization rate.

link