|
|
|
|
|
by startupsfail
158 days ago
|
|
There are still blatant failure modes, when models engage into clear sycophancy, rather than expressing enthusiasm, etc. I'd guess, in practice a benchmark (like this vibesbench), that could help catching unhelpful and blatant sycophancy fails may help. |
|