|
|
|
|
|
by motoboi
81 days ago
|
|
Things like this make me sad because they make obvious that most people don’t understand a bit about how LLM work. The “answer before reasoning” is a good evidence for it. It misses the most fundamental concept of tranaformers: the are autoregressive. Also, the reinforcement learning is what make the model behave like what you are trying to avoid. So the model output is actually what performs best in the kind of software engineering task you are trying to achieve. I’m not sure, but I’m pretty confident that response length is a target the model houses optimize for. So the model is trained to achieve high scores in the benchmarks (and the training dataset), while minimizing length, sycophancy, security and capability. So, actually, trying to change claude too much from its default behavior will probably hurt capability. Change it too much and you start veering in the dreaded “out of distribution” territory and soon discover why top researcher talk so much about not-AGI-yet. |
|
For complex tasks this is not a useful prompt.