Hacker News new | ask | show | jobs
by pizza 517 days ago
Seems like we should just use gradual annealing of tokens to more fine grained single character tokens over the course of training then
1 comments

I believe that's similar to the idea behind https://github.com/facebookresearch/blt