| HN Mirror

Testing is easy with any half decent development setup. You should have some train/eval datasets and monitor metrics on them during training, this is ML 101. And do live A/B experiments for launch candidates.

Maintenance sure is hell, lots of sweat and tears. Just glancing over search/formula/webcommon/select_ranking_models.cpp makes me cringe, they must have many dozens if not hundreds of different models in prod by now. Each of them needing maintenance and lots of training data. Work on new ranking factors I suspect must be also highly frustrating: throwing stuff at wall^W catboost black box and seeing if it sticks, and if it doesn't you'd have little idea why and control over it. Imho google's approach (white-boxish interpretable top level ranking formulas) is far superior and maintainable at scale.