|
|
|
|
|
by puma_xxx
893 days ago
|
|
This criticism seems out of touch. They are presenting a real world use case where retention and engagement is clearly the metric of interest. It's not even clear what "human evaluations" would even mean in this context. Kudos to not falling into the benchmark / human eval trap, and just testing your theories directly at scale in a deployment setting. |
|