| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by forrest2 730 days ago

A single synchronous request is not a good way to understand cost here unless your workload is truly singular tiny requests. Chatgpt handles many requests in parallel and this article's 4 GPU setup certainly can handle more too.

It is miraculous that the cost comparison isn't worse given how adversarial this test is.

Larger requests, concurrent requests, and request queueing will drastically reduce cost here.