|
|
|
|
|
by huevosabio
990 days ago
|
|
Actually, seems like they did try the suggestion out, basically by training a model with a dedicated sink token with all zeros. The verdict seems to be that you still end up with other initial tokens being used as sinks, so it is better to have a dedicated sink token. |
|