|
|
|
|
|
by lmeyerov
937 days ago
|
|
Most LLM model use shouldn't be 'raw' but as part of a smart & iterative pipeline. Ex: * reading: If you want it to do inference over a lot of context, you'll need to do multiple inferences. If each inference is faster, you can 'read' more in the same time on the same hardware * thinking: a lot of analytical approaches essentially use writing as both memory & thinking. Imagine iterative summarization, or automatically iteratively refining code until it's right For louie.ai sessions, that's meant a fascinating trade-off here when doing the above: * We can use smarter models like gpt-4 to do fewer iterations... * ... or a faster but dumber model to get more iterations in the same amount of time It's entirely not obvious. For example, the humaneval leaderboard has gpt4 for code being beat by gpt 3.5 for code when run by a LATS agent: https://paperswithcode.com/sota/code-generation-on-humaneval . This highlights that the agent framework is the one really responsible for final result quality, so their ability to run many iterations in the same time window matters. |
|