|
|
|
|
|
by float-trip
924 days ago
|
|
That's what I ended up doing (`[Author] username [Title] post title...`) > Adding new tokens needs a ton of data to train what the token means. But how much? 300M tokens is fine for a simple version of ChatML with ~4 tokens. Not for 15, at least in my case. How's this relationship scale? Just trying to offer one datapoint for what doesn't work, with the hedge that I might have just had a bug |
|