| HN Mirror

This thread reminds of a competition I once joined where we were supposed to fine-tune an LLM to fill out trivia answers, and we were expressly disallowed from training on the validation set.

However: we were allowed to pick any base model in a given repo. All of the teams that “won” did so for the same reason: they had all picked the same base model (whereas a majority of teams picked the given default), presumably the one that had at some point been trained on the most favorable data for this particular challenge.

It was quite silly. Had everyone had the same base model we’d have a bit more of an interesting problem (more around NLP and alignment than picking the ‘best’ model).