| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hu3 60 days ago
	About slowdowns... I have this theory that if they sneak some sleep(1) calls while processing medium to complex prompts they can serve more clients. But I think "context switching" between 2 different prompts might be too expensive for GPUs to be worth it for LLM providers. Who knows.