|
|
|
|
|
by ben_w
119 days ago
|
|
Because they don't yet know how to "just stop emitting so much hot air" without also removing their ability to do anything like "thinking" (or whatever you want to call the transcript mode), which is hard because knowing which tokens are hot air is the hard problem itself. They basically only started doing this because someone noticed you got better performance from the early models by straight up writing "think step by step" in your prompt. |
|
It would actually take more work to condense that long response into a terse one, particularly if the condensing was user specific, like "based on what you know about me from our interactions, reduce your response to the 200 words most relevant to my immediate needs, and wait for me to ask for more details if I require them."