Hacker News new | ask | show | jobs
by alextp 1521 days ago
Choose the top N according to the proxy objective and then use the real objective to choose the best out of those N candidates.
1 comments

That was my initial understanding, which left me confused.

But they're taking the top n according to the model, then taking the top according to the proxy, not actual, objective. This avoids the Winner's Curse problem of top model ranking with reasonable probability.

They are then comparing this to the highest scoring actual preference.