| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by unixpickle 1269 days ago
	It's definitely not just a regularizer in my case, because the gap appears even before a single epoch. The gap does also appear for two very different model architectures. One explanation is that price labels are super noisy. If there is enough noise in the primary labels, you could imagine that adding in the more predictable target variables could help reduce gradient noise and speed up training. That's my current hypothesis, but I'm very open to others. If I had more time I'd try to do more experiments on this.

1 comments

version_five 1268 days ago

That's very interesting. Do the train and val set losses both show that behavior? I did a very similar experiment earlier this year - in my case it was a classifier where images could be categorized different ways, and my takeaway was making it predict more classes improved performance. I'll have to go back and look at the loss curves during training and see if the improvement is immediate as in your case

link

unixpickle 1268 days ago

Before one epoch, both the train and eval curves look pretty much identical. Quite curious

link