| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by codemog 106 days ago
	Interesting. I see papers where researchers will finetune models in the 7 to 12b range and even beat or be competitive with frontier models. I wish I knew how this was possible, or had more intuition on such things. If anyone has paper recommendations, I’d appreciate it.

1 comments

They're using a revolutionary new method called "training on the test set".

So, curve fitting the training data? So, we should expect out of sample accuracy to be crap?

Yeah, that's usually what tends to happen with those tiny models that are amazing in benchmarks.