| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bbor 1109 days ago
	It would be a bit of a scandal, and IMO too much hassle to sneak in. These models are trained on massive amounts of text - specifically anticipating which metrics people will care about and generating synthetic data just for them seems extra. But not an expert or OP!

2 comments

stu2b50 1109 days ago

I don't think it's a scandal, it's a natural thing that happens when iterating on models. OP doesn't mean they literally train on those tests, but that as a meta-consequence of using those tests as benchmarks, you will adjust the model and hyperparameters in ways that perform better on those tests.

For a particular model you try to minimally do this by separating a test and validation set, but on a meta-meta level, it's easy to see it happening.

link

jasonfarnon 1109 days ago

You don't see an engineer at an extremely PR-conscious company at least checking how their model performs on popular benchmarks before rolling it out? And if its performance is lackluster, you do you really see them doing nothing about it? It probably doesn't make a huge difference anyway. I know those old vision models were overfitted to the standard image library benchmarks, but they were still very impressive.

link

fbdab103 1109 days ago

Famously, some of the image models were so overtrained they could still yield impressive results if the colors were removed.

link

lumost 1109 days ago

This wasn't so much overtraining, as the models learning something different than what we expected. If you look at a pixel by pixel representation of an image, textures tend to be more significant/unique patterns than shapes. There are some funny studies from the mid 2010s exploring this.

link