Hacker News new | ask | show | jobs
by huevosabio 990 days ago
Actually, seems like they did try the suggestion out, basically by training a model with a dedicated sink token with all zeros.

The verdict seems to be that you still end up with other initial tokens being used as sinks, so it is better to have a dedicated sink token.