| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by puma_xxx 893 days ago

This criticism seems out of touch.

They are presenting a real world use case where retention and engagement is clearly the metric of interest. It's not even clear what "human evaluations" would even mean in this context.

Kudos to not falling into the benchmark / human eval trap, and just testing your theories directly at scale in a deployment setting.