|
|
|
|
|
by chaos_emergent
316 days ago
|
|
> This is likely because LLMs are typically trained for only a single epoch over massive datasets, which is in contrast to the multi-hundred-epoch training regimes for which dropout was first introduced. Wait, is this true? That seems like a wild statement to make, relatively unsubstantiated? |
|