| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Javantea_ 411 days ago
	I'm surprised no one in the comments has mentioned overfitting. Perhaps this is too obvious but I think of it as a very clear bug in a model if it asserts something to be true because it has heard it once. I realize that training a model is not easy, but this is something that should've been caught before it was released. Either QA is sleeping on the job or they have intentionally released a model with serious flaws in its design/training. I also understand the intense pressure to release early and often, but this type of thing isn't a warning.

3 comments

numpad0 411 days ago

It's apparently known among LLM researchers that the best epoch count for LLM training is one. They go through the entire dataset once, and that makes best LLMs.

They know. LLM is a novel compression format for text(holographic memory or whatever). The question is whether the rest of the world accept this technology as it is or not.

link

jeroenhd 411 days ago

Overfitting makes for more human-like output (because it's repeating words written by a human). Out of all possible failure states of a model, overfitting is probably what you want out of an LLM, as long as it's not overfitted enough to lose lawsuits.

link

fennecfoxy 411 days ago

I disagree. I'd include overfitting for LLMs as creating unreasonably strong connections to individual sequences used for training, whereas a good mix of that and connections between chunks of those sequences are required.

link

Tepix 411 days ago

I think part of the problem is that the book is in the training set multiple times

link