| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pizza 565 days ago
	Seems like we should just use gradual annealing of tokens to more fine grained single character tokens over the course of training then

1 comments

I believe that's similar to the idea behind https://github.com/facebookresearch/blt