Remember the sycophant bug? Maybe making the user FEELGOOD is part of what makes it feel smart or like a good experience. Is the reward function being smart? Is it maximizing interaction? Does it conflict with being accurate?
I ran the prompt as-is on one of the main repos that I work on and the sycophancy was cloying.
It praised so many things that I would just consider table steaks and made simple tweaks or features sound like massive projects.
I’m sure it could be improved by tweaking the prompt and there were parts of it that I found impressive that it had picked out (specifically things not in commit messages) but I found it unusable in its current form.
It praised so many things that I would just consider table steaks and made simple tweaks or features sound like massive projects.
I’m sure it could be improved by tweaking the prompt and there were parts of it that I found impressive that it had picked out (specifically things not in commit messages) but I found it unusable in its current form.