Hacker News new | ask | show | jobs
by YetAnotherNick 809 days ago
It's not a lot more faster for input but it is something like 10x faster for output(mixtral vs gpt-3.5). This could enable completely new mode of interaction with LLMs e.g. agents.

In most of the cases, overall response time is mostly dominated by output as it is ~100x slower per token than input.