|
|
|
|
|
by unixpickle
1269 days ago
|
|
It's definitely not just a regularizer in my case, because the gap appears even before a single epoch. The gap does also appear for two very different model architectures. One explanation is that price labels are super noisy. If there is enough noise in the primary labels, you could imagine that adding in the more predictable target variables could help reduce gradient noise and speed up training. That's my current hypothesis, but I'm very open to others. If I had more time I'd try to do more experiments on this. |
|