| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by YetAnotherNick 809 days ago
	It's not a lot more faster for input but it is something like 10x faster for output(mixtral vs gpt-3.5). This could enable completely new mode of interaction with LLMs e.g. agents. In most of the cases, overall response time is mostly dominated by output as it is ~100x slower per token than input.